CONTEXT

Bellabit is a high-tech company that manufactures health-focused smart products since 2013. Their products:

Bellabeat wants to get insights about the smart devices usage in order to obtain trends that help their marketing strategy.

ASK

Business task

Analyze smart device usage to gain insights into how people are already using their smart devices and provide high-level recommendations to improve marketing strategies for Bellabeat.

Business use case

  • Determine trends for smart device usage
  • How could these trends apply to Bellabeat ‘Time’ customers?
  • How could these trends help to improve Bellabeat’s marketing strategy?

PREPARE

For the analysis, I will use a custom survey (First-party data) in order to get updated data related to smartwatch usages, and use R with RStudio to perform the analysis task.

Building the survey

Platform

Build in Google Forms and available in 2 languages: English and Spanish. ### Distribution Distributed trough a link using LinkedIn post, Instagram stories and Whatsapp groups. No restrictions to share the link. ### Time frame Available for 7 days, starting on 01/26/2024 ### Configuration Anonymous survey ### Questions #### Demographic questions * Gender * Age * Geographic location

Smartwatch usage questions

  • Main usage: multiple choice, with the option to add a custom answer
  • Usage periodicity: scale from 0 to 7 representing usage in days
  • Brand: multiple choice, listing main brand collected from sites like Amazon, Tiendamia, Mercado Libre. Option for add custom answer.
  • Features used: checkboxs, listing main features offered by smartwatches on the market. Option for add custom answer.
  • Tracked functionalities: checkboxs, listing main tracker options offers the smartwatches on the market. Option for add custom answer.
  • Activities tracked: checkboxs, listing main sport activities tracked for the smartwatches on the market. Option for add custom answer.

Evaluating data

Installing packages

Error in install.packages : Updating loaded packages

Importing data

We need to import both survey files, the English and the Spanish version

swu_file_en <- read_csv("Smartwatch usage - english.csv")
swu_file_es <- read_csv("Smartwatch usage - Spanish.csv")

Get preview of imported data

We need to check if there is any error importing the files

head(swu_file_en)
head(swu_file_es)

Changing column names for both files

Every file has different column names because they were shared in different languages, though they show the same information

swu_es <- rename(swu_file_es, 
       'gender_dirty' = '¿Con qué genero te identificas?',
       'age_dirty' = '¿Qué edad tenés?',
       'location_dirty' = '¿Dónde vivís?',
       'usage_dirty' = '¿Cuál es el principal uso que le das a tu reloj inteligente?',
       'periodicity' = '¿Cuántos días utilizas alguna funcionalidad de tu reloj inteligente?',
       'brand_dirty' = '¿Cuál es la marca de tu reloj inteligente?',
       'features_dirty' = '¿Cuáles son las funciones que utilizas en tu reloj inteligente?',
       'functionalities_dirty' = '¿Qué funciones utilizas para realizar un seguimiento con tu reloj inteligente?',
       'activities_dirty' = '¿Qué actividades deportivas seguís con tu reloj inteligente?'
       )

swu_en <- rename(swu_file_en, 
       'gender_dirty' = 'What gender do you identify with?',
       'age_dirty' = 'How old are you?',
       'location_dirty' = 'Where do you live?',
       'usage_dirty' = 'Which is the main use you give to your smartwatch?',
       'periodicity' = 'How many days do you use any functionality of your smartwatch?',
       'brand_dirty' = 'Which is the brand of your smartwatch?',
       'features_dirty' = 'What are the features you use on your smartwatch?',
       'functionalities_dirty' = 'What features do you use to track with your smartwatch?',
       'activities_dirty' = 'What sport activities do you track with your smartwatch?'
       )

Binding files

We need to get one file to process the information

swu_dirty <- union(swu_es, swu_en)

Checking resulting dataframe

The resulting dataframe should have the sum of rows of both files

skim_without_charts(swu_dirty)
── Data Summary ────────────────────────
                           Values   
Name                       swu_dirty
Number of rows             621      
Number of columns          10       
_______________________             
Column type frequency:              
  character                9        
  numeric                  1        
________________________            
Group variables            None     

PROCESS

Preparing environment for cleaning data

General cleaning

We perform a general cleaning on the data: * clear names (to snake_case) * remove empty values

swu <- swu_dirty %>% clean_names() %>% remove_empty(c("rows", "cols"))

Translating fields

As we share the survey in 2 languages (English and Spanish), many of the answers need to be unified to count as one type of answer

Cleaning columns

Gender

unique(swu['gender_dirty'])
swu <- mutate(swu, gender = case_when(
  gender_dirty == 'Masculino' ~ 'Male',
  gender_dirty == 'Femenino' ~ 'Female',
  gender_dirty == 'Prefiero no decir' ~ 'Prefer not to say',
  TRUE ~ gender_dirty
))
unique(swu['gender'])

Age

unique(swu['age_dirty'])
swu <- mutate(swu, age = case_when(
  age_dirty == 'Más de 66' ~ 'More than 66',
  TRUE ~ age_dirty
))
unique(swu['age'])

User location

unique(swu['location_dirty'])
swu <- mutate(swu, location = case_when(
  location_dirty == 'Norteamérica' ~ 'North America',
  location_dirty == 'América Central y Sudamérica' ~ 'South America',
  location_dirty == 'Central and South America' ~ 'South America',
  location_dirty == 'Europa' ~ 'Europe',
  location_dirty == 'África' ~ 'Africa',
  location_dirty == 'Oceanía' ~ 'Australia',
  TRUE ~ location_dirty
))
unique(swu['location'])

Main usage

unique(swu['usage_dirty'])
swu <- mutate(swu, usage = case_when(
  
  grepl('(?i)entrenamiento|(?i)training', usage_dirty) ~ 'Training tracker',
  grepl('(?i)celular|(?i)cell|(?i)notification', usage_dirty) ~ 'Shortcut to cell',
  grepl('(?i)salud|(?i)health', usage_dirty) ~ 'Health tracker',
  
  # 'Other' options listed below
  grepl('(?i)opciones|(?i)options', usage_dirty) ~ 'All options',
  grepl('(?i)hora|(?i)reloj|(?i)time|(?i)watch', usage_dirty) ~ 'Watch usage',
  grepl('(?i)pago|(?i)pay|(?i)payment', usage_dirty) ~ 'Payments',
  
  TRUE ~ usage_dirty
))
unique(swu['usage'])

Brand

unique(swu['brand_dirty'])
swu <- mutate(swu, brand = case_when(
  
  grepl('I have Samsung and Colmi. Now I use Colmi', brand_dirty) ~ 'Colmi',
  grepl('(?i)Suunto', brand_dirty) ~ 'Suunto',
  grepl('Sinrelojinteligent|(?i)xxx', brand_dirty) ~ NA,
  
  TRUE ~ brand_dirty
))
unique(swu['brand'])

Group by demographic fields

We need to create new data frames for every multiple-value columns along with demographic fields for individual analysis

Features

‘Features’ is a question for knowing which are the common uses that an user gives to their smartwatch as a device.
The users should mark as many features as they used on their smartwatches. So, we need to split all this checked options into different rows to process them that way.

  features_dirty <- select(swu, timestamp, gender, age, location, periodicity, brand, features_dirty)
  features_rows <- separate_longer_delim(features_dirty, features_dirty, ', ')
unique(features_rows['features_dirty'])

We will create two different columns for features, the first is to translate the values and the second one to group them into a bigger category

features_rows <- mutate(features_rows, features = case_when(
  grepl('(?i)deporte|(?i)sport|(?i)pasos', features_dirty) ~ 'Sports monitor',
  grepl('(?i)alarma|(?i)alarm', features_dirty) ~ 'Alarm',
  grepl('(?i)sedentary|(?i)sedentarismo', features_dirty) ~ 'Sedentary reminder',
  grepl('(?i)agua|(?i)water', features_dirty) ~ 'Water drink reminder',
  grepl('(?i)notificaciones|(?i)notifications', features_dirty) ~ 'Cell notifications',
  grepl('(?i)slack|(?i)text', features_dirty) ~ 'Text messages',
  grepl('(?i)calendar', features_dirty) ~ 'Calendar',
  grepl('(?i)música', features_dirty) ~ 'Music',
  grepl('(?i)cámara|(?i)camara', features_dirty) ~ 'Camara',
  grepl('(?i)teléfono|(?i)telefónicas', features_dirty) ~ 'Phone calls',
  grepl('(?i)voz', features_dirty) ~ 'Voice control',
  grepl('(?i)hora|(?i)time|(?i)watch', features_dirty) ~ 'Watch',
  grepl('(?i)pago', features_dirty) ~ 'Contactless payments',
  grepl('(?i)calorías|(?i)salud|(?i)estres|(?i)sueño|(?i)presion|(?i)cardiaca', features_dirty) ~ 'Health monitor',
  grepl('(?i)clima|(?i)atmosfericos|(?i)weather', features_dirty) ~ 'Weather monitor',
  
  TRUE ~ features_dirty
))
unique(features_rows['features'])
features_grouped_rows <- mutate(features_rows, features_grouped = case_when(
  grepl('(?i)sports|(?i)sport', features) ~ 'Sports monitor',
  grepl('(?i)alarm|(?i)sedentary|(?i)water', features) ~ 'Activity reminder',
  grepl('(?i)notifications|(?i)text|(?i)email|(?i)calendar', features) ~ 'Cell notifications',
  grepl('(?i)music|(?i)camara|(?i)phone|(?i)voice', features) ~ 'Cell control',
  grepl('(?i)time|(?i)watch', features) ~ 'Watch',
  grepl('(?i)payment|(?i)SOS|(?i)GPS', features) ~ 'Other features',
  grepl('(?i)calories|(?i)stress|(?i)sleep|(?i)presion|(?i)cardiac', features) ~ 'Health monitor',
  grepl('(?i)weather', features) ~ 'Weather monitor',
  
  TRUE ~ features
))
unique(features_grouped_rows['features_grouped'])

Tracked functionalities

‘tracked_functionalities’ is a question to determine which are the features tracked by the users through their smartwatches. The users should mark as many features as they track on their smartwatches. So, we need to split all this checked options into different rows to process them that way.

  functionalities_dirty <- select(swu, timestamp, gender, age, location, periodicity, brand, functionalities_dirty)
  functionalities_rows <- separate_longer_delim(functionalities_dirty, functionalities_dirty, ', ')
unique(functionalities_rows['functionalities_dirty'])
functionalities_grouped_rows <- mutate(functionalities_rows, functionalities = case_when(
  grepl('(?i)deporte|(?i)sport|(?i)pasos', functionalities_dirty) ~ 'Sports',
  grepl('(?i)presión arterial|(?i)blood', functionalities_dirty) ~ 'Blood pressure',
  grepl('(?i)calorías|(?i)calorias|(?i)calories', functionalities_dirty) ~ 'Calories',
  grepl('(?i)distancia|(?i)distance', functionalities_dirty) ~ 'Distance',
  grepl('(?i)cardíaco|(?i)heart', functionalities_dirty) ~ 'Heart rate',
  grepl('(?i)sueño|(?i)sleep', functionalities_dirty) ~ 'Sleep',
  grepl('(?i)agua|(?i)water', functionalities_dirty) ~ 'Water',
  grepl('(?i)peso|(?i)weight', functionalities_dirty) ~ 'Weight',
  grepl('(?i)temperatura|(?i)temperature', functionalities_dirty) ~ 'Temperature',
  grepl('(?i)menstrual', functionalities_dirty) ~ 'Menstrual health',
  grepl('(?i)altitud|(?i)altitude|elevation', functionalities_dirty) ~ 'Altitude',
  grepl('(?i)oxigeno|(?i)oxígeno', functionalities_dirty) ~ 'Oxygen',
  grepl('(?i)estrés|(?i)stress', functionalities_dirty) ~ 'Stress',
  grepl('(?i)noise|(?i)ruido', functionalities_dirty) ~ 'Noise',
  grepl('(?i)hora', functionalities_dirty) ~ NA,
  
  TRUE ~ functionalities_dirty
))
unique(functionalities_grouped_rows['functionalities'])
functionalities_grouped_rows <- mutate(functionalities_grouped_rows, functionalities_grouped = case_when(
  grepl('(?i)sport|(?i)distance|(?i)altitude', functionalities) ~ 'Sports',
  grepl('(?i)blood|(?i)heart|(?i)sleep|(?i)temperature|(?i)oxygen|(?i)noise', functionalities) ~ 'Realtime health tracker',
  grepl('(?i)calories|(?i)water|(?i)weight|(?i)menstrual', functionalities) ~ 'Manual health tracker',
  grepl('(?i)stress|(?i)Mindfulness', functionalities) ~ 'Wellbeing',
  
  TRUE ~ functionalities
))
unique(functionalities_grouped_rows['functionalities_grouped'])

Activities

‘activities’ is a question to determine which are the sport activities most tracked for the smartwatch users.They should mark as many activities as they track on their smartwatches. So, we need to split all this checked options into different rows to process them that way.

  activities_dirty <- select(swu, timestamp, gender, age, location, periodicity, brand, activities_dirty)
  activities_rows <- separate_longer_delim(activities_dirty, activities_dirty, ', ')
unique(activities_rows['activities_dirty'])
activities_grouped_rows <- mutate(activities_rows, activities = case_when(
  grepl('(?i)caminata|(?i)pasos|(?i)walking|(?i)patinar|(?i)skating|(?i)bailar|(?i)dancing', activities_dirty) ~ 'Urban sports',
  grepl('(?i)correr|(?i)running|(?i)ciclismo|(?i)cycling|(?i)tenis|(?i)tennis|(?i)futbol|(?i)football|(?i)soccer|(?i)enduro', activities_dirty) ~ 'Professional sports',
  grepl('(?i)hiking|(?i)trakking|(?i)senderismo|(?i)splitboard|(?i)esqui|(?i)esquí|(?i)ski|(?i)escalada|(?i)climbing', activities_dirty) ~ 'Mountain sports',
  grepl('(?i)natación|(?i)natacion|(?i)swimming|(?i)buceo|(?i)diving', activities_dirty) ~ 'Water sports',
  grepl('(?i)gimnasio|(?i)gym|(?i)entrenamiento', activities_dirty) ~ 'General gym training',
  grepl('(?i)fuerza|(?i)crossfit|(?i)funcional|(?i)cross|(?i)strong|(?i)fitness|(?i)weight|(?i)strength|(?i)workouts', activities_dirty) ~ 'Strength and endurance sports',
  grepl('(?i)pilates|(?i)yoga|(?i)stretching|(?i)meditation', activities_dirty) ~ 'Relaxing sports',

  grepl('(?i)ninguna|(?i)hora', activities_dirty) ~ NA,
  
  TRUE ~ activities_dirty
))
unique(activities_grouped_rows['activities'])

ANALIZE

We need to identify trends and relationships within data so we can accurately answer the question made

install.packages('ggplot2')
Error in install.packages : Updating loaded packages
install.packages('lessR')
Installing package into ‘/Users/rho/Library/R/x86_64/4.3/library’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/bin/macosx/big-sur-x86_64/contrib/4.3/lessR_4.3.0.tgz'
Content type 'application/x-gzip' length 6486185 bytes (6.2 MB)
==================================================
downloaded 6.2 MB

The downloaded binary packages are in
    /var/folders/rh/mhw1mj253_98zrx__sdr4rh80000gn/T//RtmpB56U0J/downloaded_packages
install.packages("ggplot2")
Installing package into ‘/Users/rho/Library/R/x86_64/4.3/library’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/bin/macosx/big-sur-x86_64/contrib/4.3/ggplot2_3.4.4.tgz'
Content type 'application/x-gzip' length 4306767 bytes (4.1 MB)
==================================================
downloaded 4.1 MB

The downloaded binary packages are in
    /var/folders/rh/mhw1mj253_98zrx__sdr4rh80000gn/T//RtmpB56U0J/downloaded_packages
install.packages('scales')
Error in install.packages : Updating loaded packages
library(ggplot2)
library(lessR)

lessR 4.3.0                         feedback: gerbing@pdx.edu 
--------------------------------------------------------------
> d <- Read("")   Read text, Excel, SPSS, SAS, or R data file
  d is default data frame, data= in analysis routines optional

Learn about reading, writing, and manipulating data, graphics,
testing means and proportions, regression, factor analysis,
customization, and descriptive statistics from pivot tables
  Enter:  browseVignettes("lessR")

View changes in this and recent versions of lessR
  Enter: news(package="lessR")

Interactive data analysis
  Enter: interact()


Attaching package: ‘lessR’

The following objects are masked from ‘package:dplyr’:

    recode, rename
library(scales)

Attaching package: ‘scales’

The following object is masked from ‘package:lessR’:

    rescale

The following object is masked from ‘package:purrr’:

    discard

The following object is masked from ‘package:readr’:

    col_factor

Demographic

We want to identify demographic trends on the sample * Gender * Age * Geographic location

Gender

We identified two main genders and a third for those who rater not say.

gender_table <- table(swu['gender'])

PieChart(gender_table, hole = 0, values = "%", main = "Gender distribution", fill = "reds")
>>> Note: gender_table is not in a data frame (table)
>>> Note: gender_table is not in a data frame (table)
>>> suggestions
PieChart(gender_table, hole=0)  # traditional pie chart
PieChart(gender_table, values="%")  # display %'s on the chart
PieChart(gender_table)  # bar chart
Plot(gender_table)  # bubble plot
Plot(gender_table, values="count")  # lollipop plot 

--- gender_table --- 

               Female   Male  Prefer not to say     Total 
Frequencies:      238    370                 13       621 
Proportions:    0.383  0.596              0.021     1.000 

Chi-squared test of null hypothesis of equal probabilities 
  Chisq = 314.812, df = 2, p-value = 0.000 

Age

We created age groups, with a range of 10 years each.

age_table <- table(swu['age'])

PieChart(age_table, hole = 0, values = "%", main = "Age distribution", fill = "reds")
>>> Note: age_table is not in a data frame (table)
>>> Note: age_table is not in a data frame (table)
>>> suggestions
PieChart(age_table, hole=0)  # traditional pie chart
PieChart(age_table, values="%")  # display %'s on the chart
PieChart(age_table)  # bar chart
Plot(age_table)  # bubble plot
Plot(age_table, values="count")  # lollipop plot 

--- age_table --- 

               15 - 25  26 - 35  36 - 45  46 - 55  56 - 65  More than 66     Total 
Frequencies:       101      201      194       83       35             7       621 
Proportions:     0.163    0.324    0.312    0.134    0.056         0.011     1.000 

Chi-squared test of null hypothesis of equal probabilities 
  Chisq = 310.411, df = 5, p-value = 0.000 

Geographic location

We want to know which is the continent distribution of smartwatches users

# Tell sf to treat world map data as a 'flat' surface instead of a sphere
sf_use_s2(FALSE)
Spherical geometry (s2) switched off
# Import world map, dissolve/union polygons by continent, and add bubble lon/lat
# locations for plotting
continents <- ne_countries(returnclass='sf') %>%
  # Russia has incorrect continent value, so need to change it
  mutate(continent = ifelse(sovereignt == "Russia", "Asia", continent)) %>%
  group_by(continent) %>%
  summarise(geom = st_union(geometry)) %>%
  filter(!continent == "Seven seas (open ocean)") %>%
  mutate(centroid_lon = st_coordinates(st_centroid(.))[,1],
         centroid_lat = st_coordinates(st_centroid(.))[,2])
although coordinates are longitude/latitude, st_union assumes that they are planar
although coordinates are longitude/latitude, st_union assumes that they are planar
although coordinates are longitude/latitude, st_union assumes that they are planar
although coordinates are longitude/latitude, st_union assumes that they are planar
although coordinates are longitude/latitude, st_union assumes that they are planar
although coordinates are longitude/latitude, st_union assumes that they are planar
although coordinates are longitude/latitude, st_union assumes that they are planar
although coordinates are longitude/latitude, st_union assumes that they are planar
Warning: There were 4 warnings in `stopifnot()`.
The first warning was:
ℹ In argument: `centroid_lon = st_coordinates(st_centroid(.))[, 1]`.
Caused by warning:
! st_centroid assumes attributes are constant over geometries
ℹ Run ]8;;ide:run:dplyr::last_dplyr_warnings()dplyr::last_dplyr_warnings()]8;; to see the 3 remaining warnings.
# dataset calculation
location_dirty <- select(swu, location)

locations<- location_dirty %>%
  group_by(location) %>% 
  summarise(count = n())

colnames(locations) <- c("continent", "count")

# Join count data to continents
continents <- left_join(continents, locations, by = "continent")

# Plot
ggplot(data = continents) +
  geom_sf() +
  geom_point(aes(x = centroid_lon, y = centroid_lat, size = count, color = "red")) +
  scale_size(range = c(1, 10)) +
  labs(size = "Count", title = "Location distribution") +
  theme(axis.title = element_blank())


# It is a good habit to turn S2 back on after you are done
sf_use_s2(TRUE)
Spherical geometry (s2) switched on

Demographic relations

Age vs Gender

age_perc <- swu %>% 
  group_by(gender, age) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2)))
# Chart
ggplot(age_perc, aes(x = factor(gender), y = perc, fill = factor(age))) +
  geom_bar(stat="identity", width = 0.7, position="fill") +
  labs(x = "Gender", y = "Percent", fill = "age", title = 'Distribution of Age per Gender (%)', subtitle = 'Stacked bars version') +
  theme_minimal(base_size = 14) +
  geom_text(data = age_perc, aes(y = perc, label = ratio), position = position_stack(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill = guide_legend(title = "Age groups"))

ggplot(data = age_perc) + 
  geom_bar(
    aes(x = gender, y = perc, fill = age, group = age), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ gender, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = gender, y = perc, label = ratio, group = age),
    position = position_dodge(width = 1),
    vjust = -0.5, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Gender", y = "Percentage", title = "Distribution of Age per Gender (%)", subtitle = "Grouped bars version") +
  theme_bw()

ggplot(data = age_perc) + 
  geom_bar(
    aes(x = gender, y = count, fill = age, group = age), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ gender, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = gender, y = count, label = count, group = age),
    position = position_dodge(width = 1),
    vjust = -0.5, size = 3
  ) +
  labs(x = "Gender", y = "Count", title = "Distribution of Age per Gender (Count)", subtitle = "Grouped bars version") +
  theme_bw()

Age vs Location

age_loc_perc <- swu %>% 
  group_by(location, age) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2)))
# Chart
ggplot(age_loc_perc, aes(x = factor(location), y = perc, fill = factor(age))) +
  geom_bar(stat="identity", width = 0.7, position="fill") +
  labs(x = "Location", y = "Percent", fill = "age", title = 'Distribution of Age per Location (%)', subtitle = 'Stacked bars version') +
  theme_minimal(base_size = 14) +
  geom_text(data = age_loc_perc, aes(y = perc, label = ratio), position = position_stack(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill = guide_legend(title = "Age groups"))

ggplot(data = age_loc_perc) + 
  geom_bar(
    aes(x = location, y = perc, fill = age, group = age), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ location, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = location, y = perc, label = ratio, group = age),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Location", y = "Percentage", title = "Distribution of Age per Location (%)", subtitle = "Grouped bars version") +
  theme_bw()

ggplot(data = age_loc_perc) + 
  geom_bar(
    aes(x = location, y = count, fill = age, group = age), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ location, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = location, y = count, label = count, group = age),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  labs(x = "Location", y = "Count", title = "Distribution of Age per Location (Count)", subtitle = "Grouped bars version") +
  theme_bw()

Gender vs Location

gender_loc_perc <- swu %>% 
  group_by(location, gender) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2)))
# Chart
ggplot(gender_loc_perc, aes(x = factor(location), y = perc, fill = factor(gender))) +
  geom_bar(stat="identity", width = 0.7, position="fill") +
  labs(x = "Location", y = "Percent", fill = "gender", title = 'Distribution of Gender per Location (%)', subtitle = 'Stacked bars version') +
  theme_minimal(base_size = 14) +
  geom_text(data = gender_loc_perc, aes(y = perc, label = ratio), position = position_stack(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill = guide_legend(title = "Gender")) +
  theme(axis.text.x = element_text(angle = 45))

ggplot(data = gender_loc_perc) + 
  geom_bar(
    aes(x = location, y = count, fill = gender, group = gender), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ location, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = location, y = count, label = count, group = gender),
    position = position_dodge(width = 1),
    vjust = 0, size = 3
  ) +
  guides(fill = guide_legend(title = "Gender")) +
  labs(x = "Location", y = "Count", title = "Distribution of Gender per Location (Count)", subtitle = "Grouped bars version") +
  theme_bw()

Establishing relations between variables

Periodicity

Periodicity vs Gender

We want to identify if there are a relation between the periodicity of a smartwatch and the user’s gender

periodicity_gender <- swu %>% 
  group_by(gender, periodicity) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2))) %>% 
  drop_na(periodicity)
# Chart
ggplot(periodicity_gender, aes(x = factor(gender), y = perc*100, fill = factor(periodicity))) +
  geom_bar(stat="identity", width = 0.7, position="fill") +
  labs(x = "Gender", y = "Percent", fill = "periodicity", title = "Percentage distribution of Periodicity per Gender") +
  theme_minimal(base_size = 14) +
  geom_text(data = periodicity_gender, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill = guide_legend(title="Periodicity"))

# Chart
ggplot(data = periodicity_gender) + 
  geom_bar(
    aes(x = gender, y = perc, fill = periodicity, group = periodicity), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ gender, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = gender, y = perc, label = ratio, group = periodicity),
    position = position_dodge(width = 1),
    vjust = -0.5, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Gender", y = "Percentage", title = "Distribution of Periodicity per Gender (%)", subtitle = "Grouped bars version") +
  theme_bw()

ggplot(data = periodicity_gender) + 
  geom_bar(
    aes(x = gender, y = count, fill = periodicity, group = periodicity), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ gender, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = gender, y = count, label = count, group = periodicity),
    position = position_dodge(width = 1),
    vjust = -0.5, size = 3
  ) +
  labs(x = "Gender", y = "Count", title = "Distribution of Periodicity per Gender (Count)", subtitle = "Grouped bars version") +
  theme_bw()

Periodicity vs Age

We want to identify if there are a relation between the periodicity of a smartwatch and the user’s age

periodicity_age <- swu %>% 
  group_by(age, periodicity) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2))) %>% 
  drop_na(periodicity)
# Chart
ggplot(periodicity_age, aes(x = factor(age), y = perc*100, fill = factor(periodicity))) +
  geom_bar(stat="identity", width = 0.7, position="fill") +
  labs(x = "Age", y = "Percent", fill = "periodicity", title = "Percentage distribution of Periodicity per Age") +
  theme_minimal(base_size = 14) +
  geom_text(data = periodicity_age, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill = guide_legend(title="Periodicity"))

# Chart
ggplot(data = periodicity_age) + 
  geom_bar(
    aes(x = age, y = perc, fill = periodicity, group = periodicity), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ age, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = age, y = perc, label = ratio, group = periodicity),
    position = position_dodge(width = 1),
    vjust = -0.5, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Age", y = "Percentage", title = "Distribution of Periodicity per Age (%)", subtitle = "Grouped bars version") +
  theme_bw()

ggplot(data = periodicity_age) + 
  geom_bar(
    aes(x = age, y = count, fill = periodicity, group = periodicity), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ age, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = age, y = count, label = count, group = periodicity),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  labs(x = "Age", y = "Count", title = "Distribution of Periodicity per Age (Count)", subtitle = "Grouped bars version") +
  theme_bw()

Periodicity vs Location

We want to identify if there are a relation between the periodicity of a smartwatch and the user’s location

periodicity_location <- swu %>% 
  group_by(location, periodicity) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2))) %>% 
  drop_na(periodicity)
# Chart
ggplot(periodicity_location, aes(x = factor(location), y = perc*100, fill = factor(periodicity))) +
  geom_bar(stat="identity", width = 0.7, position="fill") +
  labs(x = "Location", y = "Percent", fill = "periodicity", title = "Percentage distribution of Periodicity per Location") +
  theme_minimal(base_size = 14) +
  geom_text(data = periodicity_location, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill = guide_legend(title="Periodicity")) +
  theme(axis.text.x = element_text(angle = 45))

# Chart
ggplot(data = periodicity_location) + 
  geom_bar(
    aes(x = location, y = perc, fill = periodicity, group = periodicity), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ location, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = location, y = perc, label = ratio, group = periodicity),
    position = position_dodge(width = 1),
    vjust = 0, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Location", y = "Percentage", title = "Distribution of Periodicity per Location (%)", subtitle = "Grouped bars version") +
  theme_bw()

ggplot(data = periodicity_location) + 
  geom_bar(
    aes(x = location, y = count, fill = periodicity, group = periodicity), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ location, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = location, y = count, label = count, group = periodicity),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  labs(x = "Location", y = "Count", title = "Distribution of Periodicity per Location (Count)", subtitle = "Grouped bars version") +
  theme_bw()

Brand vs Gender

We want to identify if there are a relation between the brand of a smartwatch and the user’s gender

brand_gender <- swu %>% 
  group_by(gender, brand) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2))) %>% 
  drop_na(brand)
# Chart
ggplot(brand_gender, aes(x = factor(gender), y = perc * 100, fill = factor(brand))) +
  geom_bar(stat="identity", width = 0.7, position="fill") +
  labs(x = "Gender", y = "Percent", fill = "brand", title = "Percentage distribution of Brand per Gender") +
  theme_minimal(base_size = 14) +
  geom_text(data = brand_gender, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill = guide_legend(title="Brand"))

# Chart
ggplot(data = brand_gender) + 
  geom_bar(
    aes(x = gender, y = perc, fill = brand, group = brand), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ gender, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = gender, y = perc, label = ratio, group = brand),
    position = position_dodge(width = 1),
    vjust = 0, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Gender", y = "Percentage", title = "Distribution of Brand per gender (%)", subtitle = "Grouped bars version") +
  theme_bw()

ggplot(data = brand_gender) + 
  geom_bar(
    aes(x = gender, y = count, fill = brand, group = brand), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ gender, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = gender, y = count, label = count, group = brand),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  labs(x = "Gender", y = "Count", title = "Distribution of Brand per Gender (Count)", subtitle = "Grouped bars version") +
  theme_bw()

Brand vs Age

We want to identify if there are a relation between the brand of a smartwatch and the user’s age

brand_age <- swu %>% 
  group_by(age, brand) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2))) %>% 
  drop_na(brand)
# Chart
ggplot(brand_age, aes(x = factor(age), y = perc * 100, fill = factor(brand))) +
  geom_bar(stat="identity", width = 0.7, position="fill") +
  labs(x = "Age", y = "Percent", fill = "brand", title = "Percentage distribution of Brand per Age") +
  theme_minimal(base_size = 14) +
  geom_text(data = brand_age, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill = guide_legend(title = "Brand"))

# Chart
ggplot(data = brand_age) + 
  geom_bar(
    aes(x = age, y = perc, fill = brand, group = brand), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ age, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = age, y = perc, label = ratio, group = brand),
    position = position_dodge(width = 1),
    vjust = 0, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Age", y = "Percentage", title = "Distribution of Brand per Age (%)", subtitle = "Grouped bars version") +
  theme_bw()

ggplot(data = brand_age) + 
  geom_bar(
    aes(x = age, y = count, fill = brand, group = brand), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ age, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = age, y = count, label = count, group = brand),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  labs(x = "Age", y = "Count", title = "Distribution of Brand per Age (Count)", subtitle = "Grouped bars version") +
  theme_bw()

Brand vs Location

We want to identify if there are a relation between the brand of a smartwatch and the user’s location

brand_location <- swu %>% 
  group_by(location, brand) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2))) %>% 
  drop_na(brand)
# Chart
ggplot(brand_location, aes(x = factor(location), y = perc * 100, fill = factor(brand))) +
  geom_bar(stat="identity", width = 0.7, position="fill") +
  labs(x = "Location", y = "Percent", fill = "brand", title = "Percentage distribution of Brand per Location") +
  theme_minimal(base_size = 14) +
  geom_text(data = brand_location, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill = guide_legend(title = "Brand"))

# Chart
ggplot(data = brand_location) + 
  geom_bar(
    aes(x = location, y = perc, fill = brand, group = brand), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ location, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = location, y = perc, label = ratio, group = brand),
    position = position_dodge(width = 1),
    vjust = 0, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Location", y = "Percentage", title = "Distribution of Brand per Location (%)", subtitle = "Grouped bars version") +
  theme_bw()

ggplot(data = brand_location) + 
  geom_bar(
    aes(x = location, y = count, fill = brand, group = brand), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ location, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = location, y = count, label = count, group = brand),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  labs(x = "Location", y = "Count", title = "Distribution of Brand per Location (Count)", subtitle = "Grouped bars version") +
  theme_bw()

Usage

Usage vs Gender

We want to identify if there are a relation between the usage of a smartwatch and the user’s gender

gender_usage <- swu %>% 
  group_by(gender, usage) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2))) %>%
  drop_na(usage)
# Chart
ggplot(gender_usage, aes(x = factor(gender), y = perc * 100, fill = factor(usage))) +
  geom_bar(stat ="identity", width = 0.7, position = "fill") +
  labs(x = "Gender", y = "Percent", fill = "usage", title = "Percentage distribution of Usage per Gender") +
  theme_minimal(base_size = 14) +
  geom_text(data = gender_usage, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill = guide_legend(title="Usage"))

# Chart
ggplot(data = gender_usage) + 
  geom_bar(
    aes(x = gender, y = perc, fill = usage, group = usage), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ gender, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = gender, y = perc, label = ratio, group = usage),
    position = position_dodge(width = 1),
    vjust = 0, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Gender", y = "Percentage", title = "Distribution of Usage per Gender (%)", subtitle = "Grouped bars version") +
  theme_bw()

ggplot(data = gender_usage) + 
  geom_bar(
    aes(x = gender, y = count, fill = usage, group = usage), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ gender, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = gender, y = count, label = count, group = usage),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  labs(x = "Gender", y = "Count", title = "Distribution of Usage per Gender (Count)", subtitle = "Grouped bars version") +
  theme_bw()

Usage vs Age

We want to identify if there are a relation between the usage of a smartwatch and the user’s age

age_usage <- swu %>% 
  group_by(age, usage) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2))) %>%
  drop_na(usage)
# Chart
ggplot(age_usage, aes(x = factor(age), y = perc * 100, fill = factor(usage))) +
  geom_bar(stat="identity", width = 0.7, position="fill") +
  labs(x = "Age", y = "Percent", fill = "usage", title = "Percentage distribution of Usage per Age") +
  theme_minimal(base_size = 14) +
  geom_text(data = age_usage, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill = guide_legend(title = "Usage")) +
  theme(axis.text.x = element_text(angle = 45))

# Chart
ggplot(data = age_usage) + 
  geom_bar(
    aes(x = age, y = perc, fill = usage, group = usage), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ age, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = age, y = perc, label = ratio, group = usage),
    position = position_dodge(width = 1),
    vjust = 0, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Age", y = "Percentage", title = "Distribution of Usage per Age (%)", subtitle = "Grouped bars version") +
  theme_bw()

ggplot(data = age_usage) + 
  geom_bar(
    aes(x = age, y = count, fill = usage, group = usage), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ age, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = age, y = count, label = count, group = usage),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  labs(x = "Age", y = "Count", title = "Distribution of Usage per Age (Count)", subtitle = "Grouped bars version") +
  theme_bw()

Usage vs Location

We want to identify if there are a relation between the usage of a smartwatch and the user’s location

location_usage <- swu %>% 
  group_by(location, usage) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2))) %>%
  drop_na(usage)
# Chart
ggplot(location_usage, aes(x = factor(location), y = perc * 100, fill = factor(usage))) +
  geom_bar(stat="identity", width = 0.7, position = "fill") +
  labs(x = "Location", y = "Percent", fill = "usage", title = "Percentage distribution of Usage per Location") +
  theme_minimal(base_size = 14) +
  geom_text(data = location_usage, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill=guide_legend(title = "Usage")) +
  theme(axis.text.x = element_text(angle = 45))

# Chart
ggplot(data = location_usage) + 
  geom_bar(
    aes(x = location, y = perc, fill = usage, group = usage), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ location, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = location, y = perc, label = ratio, group = usage),
    position = position_dodge(width = 1),
    vjust = 0, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Location", y = "Percentage", title = "Distribution of Usage per Location (%)", subtitle = "Grouped bars version") +
  theme_bw()

ggplot(data = location_usage) + 
  geom_bar(
    aes(x = location, y = count, fill = usage, group = usage), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ location, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = location, y = count, label = count, group = usage),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  labs(x = "Location", y = "Count", title = "Distribution of Usage per Location (Count)", subtitle = "Grouped bars version") +
  theme_bw()

Features

Features vs Gender

We want to identify if there are a relation between the used features of a smartwatch and the user’s gender.

gender_features <- features_grouped_rows %>% 
  group_by(gender, features_grouped) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2))) %>%
  drop_na(features_grouped)
# Chart
ggplot(gender_features, aes(x = factor(gender), y = perc * 100, fill = factor(features_grouped))) +
  geom_bar(stat = "identity", width = 0.7, position = "fill") +
  labs(x = "Gender", y = "Percent", fill = "features_grouped", title = "Percentage distribution of Features per Gender") +
  theme_minimal(base_size = 14) +
  geom_text(data = gender_features, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill = guide_legend(title = "Features"))

# Chart
ggplot(data = gender_features) + 
  geom_bar(
    aes(x = gender, y = perc, fill = features_grouped, group = features_grouped), 
    stat = 'identity', position = 'dodge'
  ) +
  facet_wrap(~ gender, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = gender, y = perc, label = ratio, group = features_grouped),
    position = position_dodge(width = 1),
    vjust = 0, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Gender", y = "Percentage", title = "Distribution of Features per Gender (%)", subtitle = "Grouped bars version") +
  theme_bw()

ggplot(data = gender_features) + 
  geom_bar(
    aes(x = gender, y = count, fill = features_grouped, group = features_grouped), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ gender, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = gender, y = count, label = count, group = features_grouped),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  labs(x = "Gender", y = "Count", title = "Distribution of Features per Gender (Count)", subtitle = "Grouped bars version") +
  theme_bw()

Features vs Age

We want to identify if there are a relation between the used features of a smartwatch and the user’s age.

age_features <- features_grouped_rows %>% 
  group_by(age, features_grouped) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2))) %>% 
  drop_na(features_grouped)
# Chart
ggplot(age_features, aes(x = factor(age), y = perc * 100, fill = factor(features_grouped))) +
  geom_bar(stat = "identity", width = 0.7, position = "fill") +
  labs(x = "Age", y = "Percent", fill = "features_grouped", title = "Percentage distribution of Features per Age") +
  theme_minimal(base_size = 14) +
  geom_text(data = age_features, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill = guide_legend(title = "Features")) +
  theme(axis.text.x = element_text(angle = 45))

# Chart
ggplot(data = age_features) + 
  geom_bar(
    aes(x = age, y = perc, fill = features_grouped, group = features_grouped), 
    stat = 'identity', position = 'dodge'
  ) +
  facet_wrap(~ age, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = age, y = perc, label = ratio, group = features_grouped),
    position = position_dodge(width = 1),
    vjust = 0, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Age", y = "Percentage", title = "Distribution of Features per Age (%)", subtitle = "Grouped bars version") +
  theme_bw()

ggplot(data = age_features) + 
  geom_bar(
    aes(x = age, y = count, fill = features_grouped, group = features_grouped), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ age, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = age, y = count, label = count, group = features_grouped),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  labs(x = "Age", y = "Count", title = "Distribution of Features per Age (Count)", subtitle = "Grouped bars version") +
  theme_bw()

Features vs Location

We want to identify if there are a relation between the features of a smartwatch and the user’s location

location_features <- features_grouped_rows %>% 
  group_by(location, features_grouped) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2))) %>%
  drop_na(features_grouped)
# Chart
ggplot(location_features, aes(x = factor(location), y = perc * 100, fill = factor(features_grouped))) +
  geom_bar(stat = "identity", width = 0.7, position = "fill") +
  labs(x = "Location", y = "Percent", fill = "features_grouped", title = "Percentage distribution of Features per Location") +
  theme_minimal(base_size = 14) +
  geom_text(data = location_features, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill = guide_legend(title = "Features")) +
  theme(axis.text.x = element_text(angle = 45))

# Chart
ggplot(data = location_features) + 
  geom_bar(
    aes(x = location, y = perc, fill = features_grouped, group = features_grouped), 
    stat = 'identity', position = 'dodge'
  ) +
  facet_wrap(~ location, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = location, y = perc, label = ratio, group = features_grouped),
    position = position_dodge(width = 1),
    vjust = 0, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Location", y = "Percentage", title = "Distribution of Features per Location (%)", subtitle = "Grouped bars version") +
  theme_bw()

ggplot(data = location_features) + 
  geom_bar(
    aes(x = location, y = count, fill = features_grouped, group = features_grouped), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ location, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = location, y = count, label = count, group = features_grouped),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  labs(x = "Location", y = "Count", title = "Distribution of Features per Location (Count)", subtitle = "Grouped bars version") +
  theme_bw()

Functionalities

Functionalities vs Gender

We want to identify if there are a relation between the functionalities of a smartwatch and the user’s gender.

gender_functionalities <- functionalities_grouped_rows %>% 
  group_by(gender, functionalities_grouped) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2))) %>% 
  drop_na(functionalities_grouped)
# Chart
ggplot(gender_functionalities, aes(x = factor(gender), y = perc * 100, fill = factor(functionalities_grouped))) +
  geom_bar(stat = "identity", width = 0.7, position = "fill") +
  labs(x = "Gender", y = "Percent", fill = "functionalities_grouped", title = "Percentage distribution of Functionalities per Gender") +
  theme_minimal(base_size = 14) +
  geom_text(data = gender_functionalities, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill = guide_legend(title = "Functionalities"))

# Chart
ggplot(data = gender_functionalities) + 
  geom_bar(
    aes(x = gender, y = perc, fill = functionalities_grouped, group = functionalities_grouped), 
    stat = 'identity', position = 'dodge'
  ) +
  facet_wrap(~ gender, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = gender, y = perc, label = ratio, group = functionalities_grouped),
    position = position_dodge(width = 1),
    vjust = 0, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Gender", y = "Percentage", title = "Distribution of Functionalities per Gender (%)", subtitle = "Grouped bars version") +
  theme_bw()

ggplot(data = gender_functionalities) + 
  geom_bar(
    aes(x = gender, y = count, fill = functionalities_grouped, group = functionalities_grouped), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ gender, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = gender, y = count, label = count, group = functionalities_grouped),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  labs(x = "Gender", y = "Count", title = "Distribution of Functionalities per Gender (Count)", subtitle = "Grouped bars version") +
  theme_bw()

Functionalities vs Age

We want to identify if there are a relation between the tracked functionalities of a smartwatch per age.

age_functionalities <- functionalities_grouped_rows %>% 
  group_by(age, functionalities_grouped) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2))) %>% 
  drop_na(functionalities_grouped)
# Chart
ggplot(age_functionalities, aes(x = factor(age), y = perc*100, fill = factor(functionalities_grouped))) +
  geom_bar(stat="identity", width = 0.7, position="fill") +
  labs(x = "Age", y = "Percent", fill = "functionalities_grouped", title = "Percentage distribution of Functionalities per Age") +
  theme_minimal(base_size = 14) +
  geom_text(data = age_functionalities, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill=guide_legend(title="Functionalities")) +
  theme(axis.text.x = element_text(angle = 45))

# Chart
ggplot(data = age_functionalities) + 
  geom_bar(
    aes(x = age, y = perc, fill = functionalities_grouped, group = functionalities_grouped), 
    stat = 'identity', position = 'dodge'
  ) +
  facet_wrap(~ age, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = age, y = perc, label = ratio, group = functionalities_grouped),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Age", y = "Percentage", title = "Distribution of Functionalities per Age (%)", subtitle = "Grouped bars version") +
  theme_bw()

ggplot(data = age_functionalities) + 
  geom_bar(
    aes(x = age, y = count, fill = functionalities_grouped, group = functionalities_grouped), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ age, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = age, y = count, label = count, group = functionalities_grouped),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  labs(x = "Age", y = "Count", title = "Distribution of Functionalities per Age (Count)", subtitle = "Grouped bars version") +
  theme_bw()

Functionalities vs Location

We want to identify if there are a relation between the functionalities of a smartwatch per and the user’s location

locacion_functionalities <- functionalities_grouped_rows %>% 
  group_by(location, functionalities_grouped) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2))) %>% 
  drop_na(functionalities_grouped)
# Chart
ggplot(locacion_functionalities, aes(x = factor(location), y = perc*100, fill = factor(functionalities_grouped))) +
  geom_bar(stat="identity", width = 0.7, position="fill") +
  labs(x = "Location", y = "Percent", fill = "functionalities_grouped", title = "Percentage distribution of Functionalities per Location") +
  theme_minimal(base_size = 14) +
  geom_text(data = locacion_functionalities, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill=guide_legend(title="Functionalities")) +
  theme(axis.text.x = element_text(angle = 45))

# Chart
ggplot(data = locacion_functionalities) + 
  geom_bar(
    aes(x = location, y = perc, fill = functionalities_grouped, group = functionalities_grouped), 
    stat = 'identity', position = 'dodge'
  ) +
  facet_wrap(~ location, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = location, y = perc, label = ratio, group = functionalities_grouped),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Location", y = "Percentage", title = "Distribution of Functionalities per Location (%)", subtitle = "Grouped bars version") +
  theme_bw()

ggplot(data = locacion_functionalities) + 
  geom_bar(
    aes(x = location, y = count, fill = functionalities_grouped, group = functionalities_grouped), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ location, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = location, y = count, label = count, group = functionalities_grouped),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  labs(x = "Location", y = "Count", title = "Distribution of Functionalities per Location (Count)", subtitle = "Grouped bars version") +
  theme_bw()

Activities

Activities vs Gender

We want to identify if there are a relation between the sport activities tracked by users per gender.

gender_activities <- activities_grouped_rows %>% 
  group_by(gender, activities) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(count/sum(count))) %>% 
  drop_na(activities)

# Chart
ggplot(gender_activities, aes(x = factor(gender), y = perc*100, fill = factor(activities))) +
  geom_bar(stat="identity", width = 0.7, position="fill") +
  labs(x = "Gender", y = "Percent", fill = "activities", title = "Percentage distribution of 'Activities' per Gender") +
  theme_minimal(base_size = 14) +
  geom_text(data = gender_activities, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill=guide_legend(title="Activities"))

Activities vs Age

We want to identify if there are a relation between the sport activities tracked by users per age

age_activities <- activities_grouped_rows %>% 
  group_by(age, activities) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(count/sum(count))) %>% 
  drop_na(activities)

# Chart
ggplot(age_activities, aes(x = factor(age), y = perc*100, fill = factor(activities))) +
  geom_bar(stat="identity", width = 0.7, position="fill") +
  labs(x = "Age", y = "Percent", fill = "activities", title = "Percentage distribution of 'Activities' per Age") +
  theme_minimal(base_size = 14) +
  geom_text(data = age_activities, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill=guide_legend(title="Activities")) +
  theme(axis.text.x = element_text(angle = 45))
---
title: "Smart devices usage"
output: html_notebook
---

# CONTEXT

Bellabit is a high-tech company that manufactures health-focused smart products since 2013. Their products:

* **Bellabeat app**: provides users with health data related to their activity, sleep, stress,
menstrual cycle, and mindfulness habits. It connects to their line of smart wellness products.
* **Leaf**: a wellness tracker can be worn as a bracelet, necklace, or clip. It connects to the Bellabeat app to track activity, sleep, and stress.
* **Time**: a wellness watch to track user activity, sleep, and stress. It connects to the Bellabeat app to provide daily wellness.
* **Spring**: a water bottle that tracks daily water intake that connects to the Bellabeat app to track
hydration levels.
* **Bellabeat membership**: a subscription-based membership program for users that gives 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and beauty, and mindfulness based on their lifestyle and goals.

Bellabeat wants to get insights about the smart devices usage in order to obtain trends that help their marketing strategy.

# ASK

## Business task
Analyze smart device usage to gain insights into how people are already using their smart devices and provide high-level recommendations to improve marketing strategies for Bellabeat.

## Business use case
* Determine trends for smart device usage
* How could these trends apply to Bellabeat 'Time' customers?
* How could these trends help to improve Bellabeat's marketing strategy?

# PREPARE
For the analysis, I will use a custom survey (First-party data) in order to get updated data related to smartwatch usages, and use R with RStudio to perform the analysis task.

## Building the survey
### Platform
Build in Google Forms and available in 2 languages: English and Spanish. 
### Distribution
Distributed trough a link using LinkedIn post, Instagram stories and Whatsapp groups. No restrictions to share the link.
### Time frame
Available for 7 days, starting on 01/26/2024
### Configuration
Anonymous survey
### Questions
#### Demographic questions
* Gender
* Age
* Geographic location

#### Smartwatch usage questions
* Main usage: multiple choice, with the option to add a custom answer
* Usage periodicity: scale from 0 to 7 representing usage in days
* Brand: multiple choice, listing main brand collected from sites like Amazon, Tiendamia, Mercado Libre. Option for add custom answer.
* Features used: checkboxs, listing main features offered by smartwatches on the market. Option for add custom answer.
* Tracked functionalities: checkboxs, listing main tracker options offers the smartwatches on the market. Option for add custom answer.
* Activities tracked: checkboxs, listing main sport activities tracked for the smartwatches on the market. Option for add custom answer.

## Evaluating data
Installing packages
```{r Installing packages, echo=FALSE}
install.packages(c("tidyverse", "dplyr", "skimr"))
install.packages("janitor")
install.packages("gghighlight")
install.packages("tidytext")
install.packages("rnaturalearth")
install.packages("sf")
install.packages("forcats")

library(tidyverse)
library(dplyr)
library(skimr)
library(janitor)
library(gghighlight)
library(tidytext)
library(sf) # For spatial data
library(rnaturalearth) # For map data
library(forcats)
```

## Importing data
We need to import both survey files, the English and the Spanish version
```{r Importing survey files, message=FALSE, warning=FALSE}
swu_file_en <- read_csv("Smartwatch usage - english.csv")
swu_file_es <- read_csv("Smartwatch usage - Spanish.csv")
```

### Get preview of imported data
We need to check if there is any error importing the files
```{r}
head(swu_file_en)
head(swu_file_es)
```

### Changing column names for both files
Every file has different column names because they were shared in different languages, though they show the same information
```{r Changing column names}
swu_es <- rename(swu_file_es, 
       'gender_dirty' = '¿Con qué genero te identificas?',
       'age_dirty' = '¿Qué edad tenés?',
       'location_dirty' = '¿Dónde vivís?',
       'usage_dirty' = '¿Cuál es el principal uso que le das a tu reloj inteligente?',
       'periodicity' = '¿Cuántos días utilizas alguna funcionalidad de tu reloj inteligente?',
       'brand_dirty' = '¿Cuál es la marca de tu reloj inteligente?',
       'features_dirty' = '¿Cuáles son las funciones que utilizas en tu reloj inteligente?',
       'functionalities_dirty' = '¿Qué funciones utilizas para realizar un seguimiento con tu reloj inteligente?',
       'activities_dirty' = '¿Qué actividades deportivas seguís con tu reloj inteligente?'
       )

swu_en <- rename(swu_file_en, 
       'gender_dirty' = 'What gender do you identify with?',
       'age_dirty' = 'How old are you?',
       'location_dirty' = 'Where do you live?',
       'usage_dirty' = 'Which is the main use you give to your smartwatch?',
       'periodicity' = 'How many days do you use any functionality of your smartwatch?',
       'brand_dirty' = 'Which is the brand of your smartwatch?',
       'features_dirty' = 'What are the features you use on your smartwatch?',
       'functionalities_dirty' = 'What features do you use to track with your smartwatch?',
       'activities_dirty' = 'What sport activities do you track with your smartwatch?'
       )
```

### Binding files
We need to get one file to process the information
```{r Binding files into one larqqrger file}
swu_dirty <- union(swu_es, swu_en)
```

### Checking resulting dataframe
The resulting dataframe should have the sum of rows of both files
```{r Checking resulting dataframe}
skim_without_charts(swu_dirty)
```

# PROCESS
## Preparing environment for cleaning data

### General cleaning
We perform a general cleaning on the data:
* clear names (to snake_case)
* remove empty values
```{r Running general cleaning}
swu <- swu_dirty %>% clean_names() %>% remove_empty(c("rows", "cols"))
```

### Translating fields
As we share the survey in 2 languages (English and Spanish), many of the answers need to be unified to count as one type of answer

## Cleaning columns

### Gender
```{r Check for distinct values in gender column}
unique(swu['gender_dirty'])
```

```{r Creating a new column with mutated values in English for gender}
swu <- mutate(swu, gender = case_when(
  gender_dirty == 'Masculino' ~ 'Male',
  gender_dirty == 'Femenino' ~ 'Female',
  gender_dirty == 'Prefiero no decir' ~ 'Prefer not to say',
  TRUE ~ gender_dirty
))
unique(swu['gender'])
```

### Age
```{r Check for distinct values in age column}
unique(swu['age_dirty'])
```

```{r Creating a new column with mutated values in English for age}
swu <- mutate(swu, age = case_when(
  age_dirty == 'Más de 66' ~ 'More than 66',
  TRUE ~ age_dirty
))
unique(swu['age'])
```

### User location
```{r Check for distinct values in user_location column}
unique(swu['location_dirty'])
```

```{r Creating a new column with mutated values in English for user_location}
swu <- mutate(swu, location = case_when(
  location_dirty == 'Norteamérica' ~ 'North America',
  location_dirty == 'América Central y Sudamérica' ~ 'South America',
  location_dirty == 'Central and South America' ~ 'South America',
  location_dirty == 'Europa' ~ 'Europe',
  location_dirty == 'África' ~ 'Africa',
  location_dirty == 'Oceanía' ~ 'Australia',
  TRUE ~ location_dirty
))
unique(swu['location'])
```

### Main usage
```{r Check for distinct values in main_usage column}
unique(swu['usage_dirty'])
```

```{r Creating a new column with mutated values in English for usage_dirty}
swu <- mutate(swu, usage = case_when(
  
  grepl('(?i)entrenamiento|(?i)training', usage_dirty) ~ 'Training tracker',
  grepl('(?i)celular|(?i)cell|(?i)notification', usage_dirty) ~ 'Shortcut to cell',
  grepl('(?i)salud|(?i)health', usage_dirty) ~ 'Health tracker',
  
  # 'Other' options listed below
  grepl('(?i)opciones|(?i)options', usage_dirty) ~ 'All options',
  grepl('(?i)hora|(?i)reloj|(?i)time|(?i)watch', usage_dirty) ~ 'Watch usage',
  grepl('(?i)pago|(?i)pay|(?i)payment', usage_dirty) ~ 'Payments',
  
  TRUE ~ usage_dirty
))
unique(swu['usage'])
```

### Brand
```{r Check for distinct values in brand column}
unique(swu['brand_dirty'])
```

```{r Creating a new column with mutated values in English for brand}
swu <- mutate(swu, brand = case_when(
  
  grepl('I have Samsung and Colmi. Now I use Colmi', brand_dirty) ~ 'Colmi',
  grepl('(?i)Suunto', brand_dirty) ~ 'Suunto',
  grepl('Sinrelojinteligent|(?i)xxx', brand_dirty) ~ NA,
  
  TRUE ~ brand_dirty
))
unique(swu['brand'])
```

## Group by demographic fields 
We need to create new data frames for every multiple-value columns along with demographic fields for individual analysis

### Features
'Features' is a question for knowing which are the common uses that an user gives to their smartwatch as a device.  
The users should mark as many features as they used on their smartwatches. So, we need to split all this checked options into different rows to process them that way. 

```{r Creating a new dataset for features}
  features_dirty <- select(swu, timestamp, gender, age, location, periodicity, brand, features_dirty)
```

```{r Spliting features in different rows}
  features_rows <- separate_longer_delim(features_dirty, features_dirty, ', ')
```

```{r Check for distinct values in features column}
unique(features_rows['features_dirty'])
```

We will create two different columns for features, the first is to translate the values and the second one to group them into a bigger category
```{r Creating a new column with mutated values in English for features}
features_rows <- mutate(features_rows, features = case_when(
  grepl('(?i)deporte|(?i)sport|(?i)pasos', features_dirty) ~ 'Sports monitor',
  grepl('(?i)alarma|(?i)alarm', features_dirty) ~ 'Alarm',
  grepl('(?i)sedentary|(?i)sedentarismo', features_dirty) ~ 'Sedentary reminder',
  grepl('(?i)agua|(?i)water', features_dirty) ~ 'Water drink reminder',
  grepl('(?i)notificaciones|(?i)notifications', features_dirty) ~ 'Cell notifications',
  grepl('(?i)slack|(?i)text', features_dirty) ~ 'Text messages',
  grepl('(?i)calendar', features_dirty) ~ 'Calendar',
  grepl('(?i)música', features_dirty) ~ 'Music',
  grepl('(?i)cámara|(?i)camara', features_dirty) ~ 'Camara',
  grepl('(?i)teléfono|(?i)telefónicas', features_dirty) ~ 'Phone calls',
  grepl('(?i)voz', features_dirty) ~ 'Voice control',
  grepl('(?i)hora|(?i)time|(?i)watch', features_dirty) ~ 'Watch',
  grepl('(?i)pago', features_dirty) ~ 'Contactless payments',
  grepl('(?i)calorías|(?i)salud|(?i)estres|(?i)sueño|(?i)presion|(?i)cardiaca', features_dirty) ~ 'Health monitor',
  grepl('(?i)clima|(?i)atmosfericos|(?i)weather', features_dirty) ~ 'Weather monitor',
  
  TRUE ~ features_dirty
))
unique(features_rows['features'])
```


```{r Creating a new column with mutated values in English for features grouped by categories}
features_grouped_rows <- mutate(features_rows, features_grouped = case_when(
  grepl('(?i)sports|(?i)sport', features) ~ 'Sports monitor',
  grepl('(?i)alarm|(?i)sedentary|(?i)water', features) ~ 'Activity reminder',
  grepl('(?i)notifications|(?i)text|(?i)email|(?i)calendar', features) ~ 'Cell notifications',
  grepl('(?i)music|(?i)camara|(?i)phone|(?i)voice', features) ~ 'Cell control',
  grepl('(?i)time|(?i)watch', features) ~ 'Watch',
  grepl('(?i)payment|(?i)SOS|(?i)GPS', features) ~ 'Other features',
  grepl('(?i)calories|(?i)stress|(?i)sleep|(?i)presion|(?i)cardiac', features) ~ 'Health monitor',
  grepl('(?i)weather', features) ~ 'Weather monitor',
  
  TRUE ~ features
))
unique(features_grouped_rows['features_grouped'])
```
 
### Tracked functionalities
'tracked_functionalities' is a question to determine which are the features tracked by the users through their smartwatches. The users should mark as many features as they track on their smartwatches. So, we need to split all this checked options into different rows to process them that way. 

```{r Creating a new dataset for Functionalities}
  functionalities_dirty <- select(swu, timestamp, gender, age, location, periodicity, brand, functionalities_dirty)
```

```{r Spliting Functionalities in different rows}
  functionalities_rows <- separate_longer_delim(functionalities_dirty, functionalities_dirty, ', ')
```

```{r Check for distinct values in Functionalities column}
unique(functionalities_rows['functionalities_dirty'])
```

```{r Creating a new column with mutated values in English for Functionalities}
functionalities_grouped_rows <- mutate(functionalities_rows, functionalities = case_when(
  grepl('(?i)deporte|(?i)sport|(?i)pasos', functionalities_dirty) ~ 'Sports',
  grepl('(?i)presión arterial|(?i)blood', functionalities_dirty) ~ 'Blood pressure',
  grepl('(?i)calorías|(?i)calorias|(?i)calories', functionalities_dirty) ~ 'Calories',
  grepl('(?i)distancia|(?i)distance', functionalities_dirty) ~ 'Distance',
  grepl('(?i)cardíaco|(?i)heart', functionalities_dirty) ~ 'Heart rate',
  grepl('(?i)sueño|(?i)sleep', functionalities_dirty) ~ 'Sleep',
  grepl('(?i)agua|(?i)water', functionalities_dirty) ~ 'Water',
  grepl('(?i)peso|(?i)weight', functionalities_dirty) ~ 'Weight',
  grepl('(?i)temperatura|(?i)temperature', functionalities_dirty) ~ 'Temperature',
  grepl('(?i)menstrual', functionalities_dirty) ~ 'Menstrual health',
  grepl('(?i)altitud|(?i)altitude|elevation', functionalities_dirty) ~ 'Altitude',
  grepl('(?i)oxigeno|(?i)oxígeno', functionalities_dirty) ~ 'Oxygen',
  grepl('(?i)estrés|(?i)stress', functionalities_dirty) ~ 'Stress',
  grepl('(?i)noise|(?i)ruido', functionalities_dirty) ~ 'Noise',
  grepl('(?i)hora', functionalities_dirty) ~ NA,
  
  TRUE ~ functionalities_dirty
))
unique(functionalities_grouped_rows['functionalities'])
``` 

```{r Creating a new column with mutated grouped values in English for Functionalities}
functionalities_grouped_rows <- mutate(functionalities_grouped_rows, functionalities_grouped = case_when(
  grepl('(?i)sport|(?i)distance|(?i)altitude', functionalities) ~ 'Sports',
  grepl('(?i)blood|(?i)heart|(?i)sleep|(?i)temperature|(?i)oxygen|(?i)noise', functionalities) ~ 'Realtime health tracker',
  grepl('(?i)calories|(?i)water|(?i)weight|(?i)menstrual', functionalities) ~ 'Manual health tracker',
  grepl('(?i)stress|(?i)Mindfulness', functionalities) ~ 'Wellbeing',
  
  TRUE ~ functionalities
))
unique(functionalities_grouped_rows['functionalities_grouped'])
``` 
 
### Activities
'activities' is a question to determine which are the sport activities most tracked for the smartwatch users.They should mark as many activities as they track on their smartwatches. So, we need to split all this checked options into different rows to process them that way. 

```{r Creating a new dataset for Activities}
  activities_dirty <- select(swu, timestamp, gender, age, location, periodicity, brand, activities_dirty)
```

```{r Spliting Activities in different rows}
  activities_rows <- separate_longer_delim(activities_dirty, activities_dirty, ', ')
```

```{r Check for distinct values in Activities column}
unique(activities_rows['activities_dirty'])
```

```{r Creating a new column with mutated values in English for Activities}
activities_grouped_rows <- mutate(activities_rows, activities = case_when(
  grepl('(?i)caminata|(?i)pasos|(?i)walking|(?i)patinar|(?i)skating|(?i)bailar|(?i)dancing', activities_dirty) ~ 'Urban sports',
  grepl('(?i)correr|(?i)running|(?i)ciclismo|(?i)cycling|(?i)tenis|(?i)tennis|(?i)futbol|(?i)football|(?i)soccer|(?i)enduro', activities_dirty) ~ 'Professional sports',
  grepl('(?i)hiking|(?i)trakking|(?i)senderismo|(?i)splitboard|(?i)esqui|(?i)esquí|(?i)ski|(?i)escalada|(?i)climbing', activities_dirty) ~ 'Mountain sports',
  grepl('(?i)natación|(?i)natacion|(?i)swimming|(?i)buceo|(?i)diving', activities_dirty) ~ 'Water sports',
  grepl('(?i)gimnasio|(?i)gym|(?i)entrenamiento', activities_dirty) ~ 'General gym training',
  grepl('(?i)fuerza|(?i)crossfit|(?i)funcional|(?i)cross|(?i)strong|(?i)fitness|(?i)weight|(?i)strength|(?i)workouts', activities_dirty) ~ 'Strength and endurance sports',
  grepl('(?i)pilates|(?i)yoga|(?i)stretching|(?i)meditation', activities_dirty) ~ 'Relaxing sports',

  grepl('(?i)ninguna|(?i)hora', activities_dirty) ~ NA,
  
  TRUE ~ activities_dirty
))
unique(activities_grouped_rows['activities'])
```  

# ANALIZE
We need to identify trends and relationships within data so we can accurately answer the question made

```{r Installing necessary packages for graphs}
install.packages('ggplot2')
install.packages('lessR')
install.packages('scales')
library(ggplot2)
library(lessR)
library(scales)
```

## Demographic
We want to identify demographic trends on the sample
* Gender
* Age
* Geographic location

### Gender

We identified two main genders and a third for those who rater not say.

```{r Gender distribution}
gender_table <- table(swu['gender'])

PieChart(gender_table, hole = 0, values = "%", main = "Gender distribution", fill = "reds")
```
### Age

We created age groups, with a range of 10 years each.

```{r Age distribution}
age_table <- table(swu['age'])

PieChart(age_table, hole = 0, values = "%", main = "Age distribution", fill = "reds")
```

### Geographic location

We want to know which is the continent distribution of smartwatches users

```{r Location - Map distribution}
# Tell sf to treat world map data as a 'flat' surface instead of a sphere
sf_use_s2(FALSE)

# Import world map, dissolve/union polygons by continent, and add bubble lon/lat
# locations for plotting
continents <- ne_countries(returnclass='sf') %>%
  # Russia has incorrect continent value, so need to change it
  mutate(continent = ifelse(sovereignt == "Russia", "Asia", continent)) %>%
  group_by(continent) %>%
  summarise(geom = st_union(geometry)) %>%
  filter(!continent == "Seven seas (open ocean)") %>%
  mutate(centroid_lon = st_coordinates(st_centroid(.))[,1],
         centroid_lat = st_coordinates(st_centroid(.))[,2])

# dataset calculation
location_dirty <- select(swu, location)

locations<- location_dirty %>%
  group_by(location) %>% 
  summarise(count = n())

colnames(locations) <- c("continent", "count")

# Join count data to continents
continents <- left_join(continents, locations, by = "continent")

# Plot
ggplot(data = continents) +
  geom_sf() +
  geom_point(aes(x = centroid_lon, y = centroid_lat, size = count, color = "red")) +
  scale_size(range = c(1, 10)) +
  labs(size = "Count", title = "Location distribution") +
  theme(axis.title = element_blank())

# It is a good habit to turn S2 back on after you are done
sf_use_s2(TRUE)

```

### Demographic relations

#### Age vs Gender

```{r Age - Dataset calculation, message=FALSE, warning=FALSE}
age_perc <- swu %>% 
  group_by(gender, age) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2)))
```

```{r Percentage distribution of age groups per gender}
# Chart
ggplot(age_perc, aes(x = factor(gender), y = perc, fill = factor(age))) +
  geom_bar(stat="identity", width = 0.7, position="fill") +
  labs(x = "Gender", y = "Percent", fill = "age", title = 'Distribution of Age per Gender (%)', subtitle = 'Stacked bars version') +
  theme_minimal(base_size = 14) +
  geom_text(data = age_perc, aes(y = perc, label = ratio), position = position_stack(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill = guide_legend(title = "Age groups"))
```

```{r Age - Grouped by Perc Gender, warning=TRUE}
ggplot(data = age_perc) + 
  geom_bar(
    aes(x = gender, y = perc, fill = age, group = age), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ gender, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = gender, y = perc, label = ratio, group = age),
    position = position_dodge(width = 1),
    vjust = -0.5, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Gender", y = "Percentage", title = "Distribution of Age per Gender (%)", subtitle = "Grouped bars version") +
  theme_bw()
```

```{r Age - Grouped by counted Gender, warning=TRUE}
ggplot(data = age_perc) + 
  geom_bar(
    aes(x = gender, y = count, fill = age, group = age), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ gender, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = gender, y = count, label = count, group = age),
    position = position_dodge(width = 1),
    vjust = -0.5, size = 3
  ) +
  labs(x = "Gender", y = "Count", title = "Distribution of Age per Gender (Count)", subtitle = "Grouped bars version") +
  theme_bw()
```


#### Age vs Location

```{r Age - Location: Dataset calculation, message=FALSE, warning=FALSE}
age_loc_perc <- swu %>% 
  group_by(location, age) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2)))
```

```{r Percentage distribution of age groups per location}
# Chart
ggplot(age_loc_perc, aes(x = factor(location), y = perc, fill = factor(age))) +
  geom_bar(stat="identity", width = 0.7, position="fill") +
  labs(x = "Location", y = "Percent", fill = "age", title = 'Distribution of Age per Location (%)', subtitle = 'Stacked bars version') +
  theme_minimal(base_size = 14) +
  geom_text(data = age_loc_perc, aes(y = perc, label = ratio), position = position_stack(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill = guide_legend(title = "Age groups"))
```

```{r Age - Grouped by Perc Location, warning=TRUE}
ggplot(data = age_loc_perc) + 
  geom_bar(
    aes(x = location, y = perc, fill = age, group = age), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ location, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = location, y = perc, label = ratio, group = age),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Location", y = "Percentage", title = "Distribution of Age per Location (%)", subtitle = "Grouped bars version") +
  theme_bw()
```

```{r Age - Grouped by counted Location, warning=TRUE}
ggplot(data = age_loc_perc) + 
  geom_bar(
    aes(x = location, y = count, fill = age, group = age), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ location, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = location, y = count, label = count, group = age),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  labs(x = "Location", y = "Count", title = "Distribution of Age per Location (Count)", subtitle = "Grouped bars version") +
  theme_bw()
```

#### Gender vs Location

```{r Gender - Location: Dataset calculation, message=FALSE, warning=FALSE}
gender_loc_perc <- swu %>% 
  group_by(location, gender) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2)))
```

```{r Percentage distribution of gender per location}
# Chart
ggplot(gender_loc_perc, aes(x = factor(location), y = perc, fill = factor(gender))) +
  geom_bar(stat="identity", width = 0.7, position="fill") +
  labs(x = "Location", y = "Percent", fill = "gender", title = 'Distribution of Gender per Location (%)', subtitle = 'Stacked bars version') +
  theme_minimal(base_size = 14) +
  geom_text(data = gender_loc_perc, aes(y = perc, label = ratio), position = position_stack(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill = guide_legend(title = "Gender")) +
  theme(axis.text.x = element_text(angle = 45))
```

```{r Gender - Location: Grouped by counted Location, warning=TRUE}
ggplot(data = gender_loc_perc) + 
  geom_bar(
    aes(x = location, y = count, fill = gender, group = gender), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ location, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = location, y = count, label = count, group = gender),
    position = position_dodge(width = 1),
    vjust = 0, size = 3
  ) +
  guides(fill = guide_legend(title = "Gender")) +
  labs(x = "Location", y = "Count", title = "Distribution of Gender per Location (Count)", subtitle = "Grouped bars version") +
  theme_bw()
```


## Smartwatch usage trends

We want to identify trends on the sample
* Periodicity
* Brand
* Usage
* Features
* Functionalities
* Activities

### Periodicity

We want to identify how many days the users used their smartwatches.

```{r Periodicity - Dataset calculation, message=FALSE, warning=FALSE}
periodicity <- swu %>% 
  group_by(periodicity) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2)))
```

```{r Periodicity trends}
swu %>%
  ggplot(aes(x = factor(periodicity), fill = "orange", alpha=.6, width=.4)) +
  geom_bar() +
  theme(legend.position="none")+
  labs(x = "Days", y = "Count", title = 'Days of usage of smartwatches') +
  geom_text(aes(label = ..count..), stat = "count", vjust = 1.5)
```

### Brand

We want to know which are the main brands on the market

```{r Brand trends} 
# Reorder from highest frequency to lowest frequency
swu %>% drop_na(brand) %>% 
  ggplot(aes(x = fct_rev(fct_infreq(brand)), fill = "orange", alpha=.6, width=.4)) +
  geom_bar() +
  coord_flip() +
  theme(legend.position="none")+
  labs(x = "Brand", y = "Count", title = 'Popular brands of smartwatches') +
  geom_text(aes(label = ..count..), stat = "count", vjust = 0.5, hjust = -0.1)
```

### Main usage

We want to identify which is the main usage the users give to their smartwatches.

```{r Main usage trends} 
# Reorder from highest frequency to lowest frequency
swu %>%
  ggplot(aes(x = fct_rev(fct_infreq(usage)), fill = "orange", alpha=.6, width=.4)) +
  geom_bar() +
  coord_flip() +
  theme(legend.position="none")+
  labs(x = "Main usage", y = "Count", title = 'Main usage of smartwatches') +
  geom_text(aes(label = ..count..), stat = "count", vjust = 0.5, hjust = -0.1)
```

### Features

We want to identify which are the detailed smartwatch features the users use the most.

```{r Features trends} 
# Reorder from highest frequency to lowest frequency
features_rows %>%
  ggplot(aes(x = fct_rev(fct_infreq(features)), fill = "orange", alpha=.6, width=.4)) +
  geom_bar() +
  coord_flip() +
  theme(legend.position="none")+
  labs(x = "Features", y = "Count", title = 'Popular features of smartwatches') +
  geom_text(aes(label = ..count..), stat = "count", vjust = 0.5, hjust = -0.1)
```

We create larger categories from features to get a bigger picture of the trends

```{r Features grouped trends} 
# Reorder from highest frequency to lowest frequency
features_grouped_rows %>%
  ggplot(aes(x = fct_rev(fct_infreq(features_grouped)), fill = "orange", alpha=.6, width=.4)) +
  geom_bar() +
  coord_flip() +
  theme(legend.position="none")+
  labs(x = "Features", y = "Count", title = 'Popular grouped features of smartwatches') +
  geom_text(aes(label = ..count..), stat = "count", vjust = 0.5, hjust = -0.1)
```

### Functionalities

We want to identify which are the most used functionalities that are tracked for the users in their smartwatch.

```{r Tracked functionalities trends} 
# Reorder from highest frequency to lowest frequency
functionalities_grouped_rows %>% drop_na(functionalities) %>% 
  ggplot(aes(x = fct_rev(fct_infreq(functionalities)), fill = "orange", alpha=.6, width=.4)) +
  geom_bar() +
  coord_flip() +
  theme(legend.position="none")+
  labs(x = "Tracked functionalities", y = "Count", title = 'Popular tracked functionalities of smartwatches') +
  geom_text(aes(label = ..count..), stat = "count", vjust = 0.5, hjust = -0.1)
```

We create larger categories from functionalities to get a bigger picture of the trends

```{r Functionalities grouped trends} 
# Reorder from highest frequency to lowest frequency
functionalities_grouped_rows %>% drop_na(functionalities_grouped) %>% 
  ggplot(aes(x = fct_rev(fct_infreq(functionalities_grouped)), fill = "orange", alpha=.6, width=.4)) +
  geom_bar() +
  coord_flip() +
  theme(legend.position="none")+
  labs(x = "Functionalities", y = "Count", title = 'Popular grouped functionalities of smartwatches') +
  geom_text(aes(label = ..count..), stat = "count", vjust = 0.5, hjust = 1.1)
```


### User activities

We want to know which are the sport activities most tracked for the users in their smartwatch.

```{r User activities trends} 
# Reorder from highest frequency to lowest frequency
activities_grouped_rows %>% drop_na(activities) %>% 
  ggplot(aes(x = fct_rev(fct_infreq(activities)), fill = "orange", alpha=.6, width=.4)) +
  geom_bar() +
  coord_flip() +
  theme(legend.position="none")+
  labs(x = "Sport activities", y = "Count", title = 'Popular sport activities tracked by users') +
  geom_text(aes(label = ..count..), stat = "count", vjust = 0.5, hjust = 1.1)
```

## Establishing relations between variables

### Periodicity

#### Periodicity vs Gender

We want to identify if there are a relation between the periodicity of a smartwatch and the user's gender

```{r Periodicity - Gender: dataset calculation, message=FALSE, warning=FALSE}
periodicity_gender <- swu %>% 
  group_by(gender, periodicity) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2))) %>% 
  drop_na(periodicity)
```

```{r Periodicity - Gender: Percentage, warning=TRUE}
# Chart
ggplot(periodicity_gender, aes(x = factor(gender), y = perc*100, fill = factor(periodicity))) +
  geom_bar(stat="identity", width = 0.7, position="fill") +
  labs(x = "Gender", y = "Percent", fill = "periodicity", title = "Percentage distribution of Periodicity per Gender") +
  theme_minimal(base_size = 14) +
  geom_text(data = periodicity_gender, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill = guide_legend(title="Periodicity"))
```

```{r Periodicity - Gender: Grouped by Perc Gender, warning=TRUE}
# Chart
ggplot(data = periodicity_gender) + 
  geom_bar(
    aes(x = gender, y = perc, fill = periodicity, group = periodicity), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ gender, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = gender, y = perc, label = ratio, group = periodicity),
    position = position_dodge(width = 1),
    vjust = -0.5, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Gender", y = "Percentage", title = "Distribution of Periodicity per Gender (%)", subtitle = "Grouped bars version") +
  theme_bw()
```

```{r Periodicity - Gender: Grouped by counted Gender, warning=TRUE}
ggplot(data = periodicity_gender) + 
  geom_bar(
    aes(x = gender, y = count, fill = periodicity, group = periodicity), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ gender, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = gender, y = count, label = count, group = periodicity),
    position = position_dodge(width = 1),
    vjust = -0.5, size = 3
  ) +
  labs(x = "Gender", y = "Count", title = "Distribution of Periodicity per Gender (Count)", subtitle = "Grouped bars version") +
  theme_bw()
```

#### Periodicity vs Age

We want to identify if there are a relation between the periodicity of a smartwatch and the user's age

```{r Periodicity - Age: dataset calculation, message=FALSE, warning=FALSE}
periodicity_age <- swu %>% 
  group_by(age, periodicity) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2))) %>% 
  drop_na(periodicity)
```

```{r Periodicity - Age: Percentage, warning=TRUE}
# Chart
ggplot(periodicity_age, aes(x = factor(age), y = perc*100, fill = factor(periodicity))) +
  geom_bar(stat="identity", width = 0.7, position="fill") +
  labs(x = "Age", y = "Percent", fill = "periodicity", title = "Percentage distribution of Periodicity per Age") +
  theme_minimal(base_size = 14) +
  geom_text(data = periodicity_age, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill = guide_legend(title="Periodicity"))
```

```{r Periodicity - Age: Grouped by Perc Age, warning=TRUE}
# Chart
ggplot(data = periodicity_age) + 
  geom_bar(
    aes(x = age, y = perc, fill = periodicity, group = periodicity), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ age, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = age, y = perc, label = ratio, group = periodicity),
    position = position_dodge(width = 1),
    vjust = -0.5, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Age", y = "Percentage", title = "Distribution of Periodicity per Age (%)", subtitle = "Grouped bars version") +
  theme_bw()
```

```{r Periodicity - Age: Grouped by counted Age, warning=TRUE}
ggplot(data = periodicity_age) + 
  geom_bar(
    aes(x = age, y = count, fill = periodicity, group = periodicity), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ age, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = age, y = count, label = count, group = periodicity),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  labs(x = "Age", y = "Count", title = "Distribution of Periodicity per Age (Count)", subtitle = "Grouped bars version") +
  theme_bw()
```

#### Periodicity vs Location

We want to identify if there are a relation between the periodicity of a smartwatch and the user's location

```{r Periodicity - Location: dataset calculation, message=FALSE, warning=FALSE}
periodicity_location <- swu %>% 
  group_by(location, periodicity) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2))) %>% 
  drop_na(periodicity)
```

```{r Periodicity - Location: Percentage, warning=TRUE}
# Chart
ggplot(periodicity_location, aes(x = factor(location), y = perc*100, fill = factor(periodicity))) +
  geom_bar(stat="identity", width = 0.7, position="fill") +
  labs(x = "Location", y = "Percent", fill = "periodicity", title = "Percentage distribution of Periodicity per Location") +
  theme_minimal(base_size = 14) +
  geom_text(data = periodicity_location, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill = guide_legend(title="Periodicity")) +
  theme(axis.text.x = element_text(angle = 45))
```

```{r Periodicity - Location: Grouped by Perc Location, warning=TRUE}
# Chart
ggplot(data = periodicity_location) + 
  geom_bar(
    aes(x = location, y = perc, fill = periodicity, group = periodicity), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ location, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = location, y = perc, label = ratio, group = periodicity),
    position = position_dodge(width = 1),
    vjust = 0, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Location", y = "Percentage", title = "Distribution of Periodicity per Location (%)", subtitle = "Grouped bars version") +
  theme_bw()
```

```{r Periodicity - Location: Grouped by counted Location, warning=TRUE}
ggplot(data = periodicity_location) + 
  geom_bar(
    aes(x = location, y = count, fill = periodicity, group = periodicity), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ location, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = location, y = count, label = count, group = periodicity),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  labs(x = "Location", y = "Count", title = "Distribution of Periodicity per Location (Count)", subtitle = "Grouped bars version") +
  theme_bw()
```


#### Brand vs Gender

We want to identify if there are a relation between the brand of a smartwatch and the user's gender

```{r Brand - Gender: dataset calculation, message=FALSE, warning=FALSE}
brand_gender <- swu %>% 
  group_by(gender, brand) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2))) %>% 
  drop_na(brand)
```

```{r Brand - Gender: Percentage, warning=TRUE}
# Chart
ggplot(brand_gender, aes(x = factor(gender), y = perc * 100, fill = factor(brand))) +
  geom_bar(stat="identity", width = 0.7, position="fill") +
  labs(x = "Gender", y = "Percent", fill = "brand", title = "Percentage distribution of Brand per Gender") +
  theme_minimal(base_size = 14) +
  geom_text(data = brand_gender, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill = guide_legend(title = "Brand"))
```

```{r Brand - Gender: Grouped by Perc Gender, warning=TRUE}
# Chart
ggplot(data = brand_gender) + 
  geom_bar(
    aes(x = gender, y = perc, fill = brand, group = brand), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ gender, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = gender, y = perc, label = ratio, group = brand),
    position = position_dodge(width = 1),
    vjust = 0, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Gender", y = "Percentage", title = "Distribution of Brand per gender (%)", subtitle = "Grouped bars version") +
  theme_bw()
```

```{r Brand - Gender: Grouped by counted Gender, warning=TRUE}
ggplot(data = brand_gender) + 
  geom_bar(
    aes(x = gender, y = count, fill = brand, group = brand), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ gender, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = gender, y = count, label = count, group = brand),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  labs(x = "Gender", y = "Count", title = "Distribution of Brand per Gender (Count)", subtitle = "Grouped bars version") +
  theme_bw()
```

#### Brand vs Age

We want to identify if there are a relation between the brand of a smartwatch and the user's age

```{r Brand - Age: dataset calculation, message=FALSE, warning=FALSE}
brand_age <- swu %>% 
  group_by(age, brand) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2))) %>% 
  drop_na(brand)
```

```{r Brand - Age: Percentage, warning=TRUE}
# Chart
ggplot(brand_age, aes(x = factor(age), y = perc * 100, fill = factor(brand))) +
  geom_bar(stat="identity", width = 0.7, position="fill") +
  labs(x = "Age", y = "Percent", fill = "brand", title = "Percentage distribution of Brand per Age") +
  theme_minimal(base_size = 14) +
  geom_text(data = brand_age, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill = guide_legend(title = "Brand"))
```

```{r Brand - Age: Grouped by Perc Age, warning=TRUE}
# Chart
ggplot(data = brand_age) + 
  geom_bar(
    aes(x = age, y = perc, fill = brand, group = brand), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ age, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = age, y = perc, label = ratio, group = brand),
    position = position_dodge(width = 1),
    vjust = 0, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Age", y = "Percentage", title = "Distribution of Brand per Age (%)", subtitle = "Grouped bars version") +
  theme_bw()
```

```{r Brand - Age: Grouped by counted Age, warning=TRUE}
ggplot(data = brand_age) + 
  geom_bar(
    aes(x = age, y = count, fill = brand, group = brand), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ age, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = age, y = count, label = count, group = brand),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  labs(x = "Age", y = "Count", title = "Distribution of Brand per Age (Count)", subtitle = "Grouped bars version") +
  theme_bw()
```

#### Brand vs Location

We want to identify if there are a relation between the brand of a smartwatch and the user's location

```{r Brand - Location: dataset calculation, message=FALSE, warning=FALSE}
brand_location <- swu %>% 
  group_by(location, brand) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2))) %>% 
  drop_na(brand)
```

```{r Brand - Location: Percentage, warning=TRUE}
# Chart
ggplot(brand_location, aes(x = factor(location), y = perc * 100, fill = factor(brand))) +
  geom_bar(stat="identity", width = 0.7, position="fill") +
  labs(x = "Location", y = "Percent", fill = "brand", title = "Percentage distribution of Brand per Location") +
  theme_minimal(base_size = 14) +
  geom_text(data = brand_location, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill = guide_legend(title = "Brand"))
```

```{r Brand - Location: Grouped by Perc Location, warning=TRUE}
# Chart
ggplot(data = brand_location) + 
  geom_bar(
    aes(x = location, y = perc, fill = brand, group = brand), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ location, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = location, y = perc, label = ratio, group = brand),
    position = position_dodge(width = 1),
    vjust = 0, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Location", y = "Percentage", title = "Distribution of Brand per Location (%)", subtitle = "Grouped bars version") +
  theme_bw()
```

```{r Brand - Location: Grouped by counted Location, warning=TRUE}
ggplot(data = brand_location) + 
  geom_bar(
    aes(x = location, y = count, fill = brand, group = brand), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ location, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = location, y = count, label = count, group = brand),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  labs(x = "Location", y = "Count", title = "Distribution of Brand per Location (Count)", subtitle = "Grouped bars version") +
  theme_bw()
```

### Usage

#### Usage vs Gender

We want to identify if there are a relation between the usage of a smartwatch and the user's gender

```{r Usage - Gender: dataset calculation, message=FALSE, warning=FALSE}
gender_usage <- swu %>% 
  group_by(gender, usage) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2))) %>%
  drop_na(usage)
```

```{r Usage - Gender: Percentage, warning=TRUE}
# Chart
ggplot(gender_usage, aes(x = factor(gender), y = perc * 100, fill = factor(usage))) +
  geom_bar(stat ="identity", width = 0.7, position = "fill") +
  labs(x = "Gender", y = "Percent", fill = "usage", title = "Percentage distribution of Usage per Gender") +
  theme_minimal(base_size = 14) +
  geom_text(data = gender_usage, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill = guide_legend(title="Usage"))
```

```{r Usage - Gender: Grouped by Perc Gender, warning=TRUE}
# Chart
ggplot(data = gender_usage) + 
  geom_bar(
    aes(x = gender, y = perc, fill = usage, group = usage), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ gender, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = gender, y = perc, label = ratio, group = usage),
    position = position_dodge(width = 1),
    vjust = 0, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Gender", y = "Percentage", title = "Distribution of Usage per Gender (%)", subtitle = "Grouped bars version") +
  theme_bw()
```

```{r Usage - Gender: Grouped by counted Gender, warning=TRUE}
ggplot(data = gender_usage) + 
  geom_bar(
    aes(x = gender, y = count, fill = usage, group = usage), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ gender, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = gender, y = count, label = count, group = usage),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  labs(x = "Gender", y = "Count", title = "Distribution of Usage per Gender (Count)", subtitle = "Grouped bars version") +
  theme_bw()
```

#### Usage vs Age

We want to identify if there are a relation between the usage of a smartwatch and the user's age

```{r Usage - Age: dataset calculation, message=FALSE, warning=FALSE}
age_usage <- swu %>% 
  group_by(age, usage) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2))) %>%
  drop_na(usage)
```

```{r Usage - Age: Percentage}
# Chart
ggplot(age_usage, aes(x = factor(age), y = perc * 100, fill = factor(usage))) +
  geom_bar(stat="identity", width = 0.7, position="fill") +
  labs(x = "Age", y = "Percent", fill = "usage", title = "Percentage distribution of Usage per Age") +
  theme_minimal(base_size = 14) +
  geom_text(data = age_usage, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill = guide_legend(title = "Usage")) +
  theme(axis.text.x = element_text(angle = 45))
```

```{r Usage - Age: Grouped by Perc Age, warning=TRUE}
# Chart
ggplot(data = age_usage) + 
  geom_bar(
    aes(x = age, y = perc, fill = usage, group = usage), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ age, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = age, y = perc, label = ratio, group = usage),
    position = position_dodge(width = 1),
    vjust = 0, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Age", y = "Percentage", title = "Distribution of Usage per Age (%)", subtitle = "Grouped bars version") +
  theme_bw()
```

```{r Usage - Age: Grouped by counted Age, warning=TRUE}
ggplot(data = age_usage) + 
  geom_bar(
    aes(x = age, y = count, fill = usage, group = usage), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ age, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = age, y = count, label = count, group = usage),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  labs(x = "Age", y = "Count", title = "Distribution of Usage per Age (Count)", subtitle = "Grouped bars version") +
  theme_bw()
```

#### Usage vs Location

We want to identify if there are a relation between the usage of a smartwatch and the user's location

```{r Usage - Location: dataset calculation, message=FALSE, warning=FALSE}
location_usage <- swu %>% 
  group_by(location, usage) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2))) %>%
  drop_na(usage)
```

```{r Usage - Location: Percentage}
# Chart
ggplot(location_usage, aes(x = factor(location), y = perc * 100, fill = factor(usage))) +
  geom_bar(stat="identity", width = 0.7, position = "fill") +
  labs(x = "Location", y = "Percent", fill = "usage", title = "Percentage distribution of Usage per Location") +
  theme_minimal(base_size = 14) +
  geom_text(data = location_usage, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill=guide_legend(title = "Usage")) +
  theme(axis.text.x = element_text(angle = 45))
```

```{r Usage - Location: Grouped by Perc Location, warning=TRUE}
# Chart
ggplot(data = location_usage) + 
  geom_bar(
    aes(x = location, y = perc, fill = usage, group = usage), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ location, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = location, y = perc, label = ratio, group = usage),
    position = position_dodge(width = 1),
    vjust = 0, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Location", y = "Percentage", title = "Distribution of Usage per Location (%)", subtitle = "Grouped bars version") +
  theme_bw()
```

```{r Usage - Location: Grouped by counted Location, warning=TRUE}
ggplot(data = location_usage) + 
  geom_bar(
    aes(x = location, y = count, fill = usage, group = usage), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ location, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = location, y = count, label = count, group = usage),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  labs(x = "Location", y = "Count", title = "Distribution of Usage per Location (Count)", subtitle = "Grouped bars version") +
  theme_bw()
```

### Features

#### Features vs Gender

We want to identify if there are a relation between the used features of a smartwatch and the user's gender.

```{r Features - Gender: dataset calculation, message=FALSE, warning=FALSE}
gender_features <- features_grouped_rows %>% 
  group_by(gender, features_grouped) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2))) %>%
  drop_na(features_grouped)
```

```{r Features - Gender: Percentage}
# Chart
ggplot(gender_features, aes(x = factor(gender), y = perc * 100, fill = factor(features_grouped))) +
  geom_bar(stat = "identity", width = 0.7, position = "fill") +
  labs(x = "Gender", y = "Percent", fill = "features_grouped", title = "Percentage distribution of Features per Gender") +
  theme_minimal(base_size = 14) +
  geom_text(data = gender_features, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill = guide_legend(title = "Features"))
```

```{r Features - Gender: Grouped by Perc Gender, warning=TRUE}
# Chart
ggplot(data = gender_features) + 
  geom_bar(
    aes(x = gender, y = perc, fill = features_grouped, group = features_grouped), 
    stat = 'identity', position = 'dodge'
  ) +
  facet_wrap(~ gender, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = gender, y = perc, label = ratio, group = features_grouped),
    position = position_dodge(width = 1),
    vjust = 0, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Gender", y = "Percentage", title = "Distribution of Features per Gender (%)", subtitle = "Grouped bars version") +
  theme_bw()
```

```{r Features - Gender: Grouped by counted Gender, warning=TRUE}
ggplot(data = gender_features) + 
  geom_bar(
    aes(x = gender, y = count, fill = features_grouped, group = features_grouped), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ gender, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = gender, y = count, label = count, group = features_grouped),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  labs(x = "Gender", y = "Count", title = "Distribution of Features per Gender (Count)", subtitle = "Grouped bars version") +
  theme_bw()
```

#### Features vs Age

We want to identify if there are a relation between the used features of a smartwatch and the user's age.

```{r Features - Age: dataset calculation, message=FALSE, warning=FALSE}
age_features <- features_grouped_rows %>% 
  group_by(age, features_grouped) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2))) %>% 
  drop_na(features_grouped)
```

```{r Features - Age: Percentage}
# Chart
ggplot(age_features, aes(x = factor(age), y = perc * 100, fill = factor(features_grouped))) +
  geom_bar(stat = "identity", width = 0.7, position = "fill") +
  labs(x = "Age", y = "Percent", fill = "features_grouped", title = "Percentage distribution of Features per Age") +
  theme_minimal(base_size = 14) +
  geom_text(data = age_features, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill = guide_legend(title = "Features")) +
  theme(axis.text.x = element_text(angle = 45))
```

```{r Features - Age: Grouped by Perc Age, warning=TRUE}
# Chart
ggplot(data = age_features) + 
  geom_bar(
    aes(x = age, y = perc, fill = features_grouped, group = features_grouped), 
    stat = 'identity', position = 'dodge'
  ) +
  facet_wrap(~ age, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = age, y = perc, label = ratio, group = features_grouped),
    position = position_dodge(width = 1),
    vjust = 0, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Age", y = "Percentage", title = "Distribution of Features per Age (%)", subtitle = "Grouped bars version") +
  theme_bw()
```

```{r Features - Age: Grouped by counted Age, warning=TRUE}
ggplot(data = age_features) + 
  geom_bar(
    aes(x = age, y = count, fill = features_grouped, group = features_grouped), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ age, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = age, y = count, label = count, group = features_grouped),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  labs(x = "Age", y = "Count", title = "Distribution of Features per Age (Count)", subtitle = "Grouped bars version") +
  theme_bw()
```

#### Features vs Location

We want to identify if there are a relation between the features of a smartwatch and the user's location

```{r Features - Location: dataset calculation, message=FALSE, warning=FALSE}
location_features <- features_grouped_rows %>% 
  group_by(location, features_grouped) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2))) %>%
  drop_na(features_grouped)
```

```{r Features - Location: Percentage}
# Chart
ggplot(location_features, aes(x = factor(location), y = perc * 100, fill = factor(features_grouped))) +
  geom_bar(stat = "identity", width = 0.7, position = "fill") +
  labs(x = "Location", y = "Percent", fill = "features_grouped", title = "Percentage distribution of Features per Location") +
  theme_minimal(base_size = 14) +
  geom_text(data = location_features, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill = guide_legend(title = "Features")) +
  theme(axis.text.x = element_text(angle = 45))
```

```{r Features - Location: Grouped by Perc Location, warning=TRUE}
# Chart
ggplot(data = location_features) + 
  geom_bar(
    aes(x = location, y = perc, fill = features_grouped, group = features_grouped), 
    stat = 'identity', position = 'dodge'
  ) +
  facet_wrap(~ location, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = location, y = perc, label = ratio, group = features_grouped),
    position = position_dodge(width = 1),
    vjust = 0, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Location", y = "Percentage", title = "Distribution of Features per Location (%)", subtitle = "Grouped bars version") +
  theme_bw()
```

```{r Features - Location: Grouped by counted Location, warning=TRUE}
ggplot(data = location_features) + 
  geom_bar(
    aes(x = location, y = count, fill = features_grouped, group = features_grouped), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ location, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = location, y = count, label = count, group = features_grouped),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  labs(x = "Location", y = "Count", title = "Distribution of Features per Location (Count)", subtitle = "Grouped bars version") +
  theme_bw()
```

### Functionalities

#### Functionalities vs Gender

We want to identify if there are a relation between the functionalities of a smartwatch and the user's gender.

```{r Functionalities - Gender: dataset calculation, message=FALSE, warning=FALSE}
gender_functionalities <- functionalities_grouped_rows %>% 
  group_by(gender, functionalities_grouped) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2))) %>% 
  drop_na(functionalities_grouped)
```

```{r Functionalities - Gender: Percentage}
# Chart
ggplot(gender_functionalities, aes(x = factor(gender), y = perc * 100, fill = factor(functionalities_grouped))) +
  geom_bar(stat = "identity", width = 0.7, position = "fill") +
  labs(x = "Gender", y = "Percent", fill = "functionalities_grouped", title = "Percentage distribution of Functionalities per Gender") +
  theme_minimal(base_size = 14) +
  geom_text(data = gender_functionalities, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill = guide_legend(title = "Functionalities"))
```

```{r Functionalities - Gender: Grouped by Perc Gender, warning=TRUE}
# Chart
ggplot(data = gender_functionalities) + 
  geom_bar(
    aes(x = gender, y = perc, fill = functionalities_grouped, group = functionalities_grouped), 
    stat = 'identity', position = 'dodge'
  ) +
  facet_wrap(~ gender, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = gender, y = perc, label = ratio, group = functionalities_grouped),
    position = position_dodge(width = 1),
    vjust = 0, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Gender", y = "Percentage", title = "Distribution of Functionalities per Gender (%)", subtitle = "Grouped bars version") +
  theme_bw()
```

```{r Functionalities - Gender: Grouped by counted Gender, warning=TRUE}
ggplot(data = gender_functionalities) + 
  geom_bar(
    aes(x = gender, y = count, fill = functionalities_grouped, group = functionalities_grouped), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ gender, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = gender, y = count, label = count, group = functionalities_grouped),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  labs(x = "Gender", y = "Count", title = "Distribution of Functionalities per Gender (Count)", subtitle = "Grouped bars version") +
  theme_bw()
```

#### Functionalities vs Age

We want to identify if there are a relation between the tracked functionalities of a smartwatch per age.

```{r Functionalities - Age: dataset calculation, message=FALSE, warning=FALSE}
age_functionalities <- functionalities_grouped_rows %>% 
  group_by(age, functionalities_grouped) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2))) %>% 
  drop_na(functionalities_grouped)
```

```{r Functionalities - Age: Percentage}
# Chart
ggplot(age_functionalities, aes(x = factor(age), y = perc*100, fill = factor(functionalities_grouped))) +
  geom_bar(stat="identity", width = 0.7, position="fill") +
  labs(x = "Age", y = "Percent", fill = "functionalities_grouped", title = "Percentage distribution of Functionalities per Age") +
  theme_minimal(base_size = 14) +
  geom_text(data = age_functionalities, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill=guide_legend(title="Functionalities")) +
  theme(axis.text.x = element_text(angle = 45))
```

```{r Functionalities - Age: Grouped by Perc Age, warning=TRUE}
# Chart
ggplot(data = age_functionalities) + 
  geom_bar(
    aes(x = age, y = perc, fill = functionalities_grouped, group = functionalities_grouped), 
    stat = 'identity', position = 'dodge'
  ) +
  facet_wrap(~ age, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = age, y = perc, label = ratio, group = functionalities_grouped),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Age", y = "Percentage", title = "Distribution of Functionalities per Age (%)", subtitle = "Grouped bars version") +
  theme_bw()
```

```{r Functionalities - Age: Grouped by counted Age, warning=TRUE}
ggplot(data = age_functionalities) + 
  geom_bar(
    aes(x = age, y = count, fill = functionalities_grouped, group = functionalities_grouped), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ age, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = age, y = count, label = count, group = functionalities_grouped),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  labs(x = "Age", y = "Count", title = "Distribution of Functionalities per Age (Count)", subtitle = "Grouped bars version") +
  theme_bw()
```

#### Functionalities vs Location

We want to identify if there are a relation between the functionalities of a smartwatch per and the user's location

```{r Functionalities - Location: dataset calculation, message=FALSE, warning=FALSE}
locacion_functionalities <- functionalities_grouped_rows %>% 
  group_by(location, functionalities_grouped) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(round(perc, 2))) %>% 
  drop_na(functionalities_grouped)
```

```{r Functionalities - Location: Percentage}
# Chart
ggplot(locacion_functionalities, aes(x = factor(location), y = perc*100, fill = factor(functionalities_grouped))) +
  geom_bar(stat="identity", width = 0.7, position="fill") +
  labs(x = "Location", y = "Percent", fill = "functionalities_grouped", title = "Percentage distribution of Functionalities per Location") +
  theme_minimal(base_size = 14) +
  geom_text(data = locacion_functionalities, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill=guide_legend(title="Functionalities")) +
  theme(axis.text.x = element_text(angle = 45))
```

```{r Functionalities - Location: Grouped by Perc Location, warning=TRUE}
# Chart
ggplot(data = locacion_functionalities) + 
  geom_bar(
    aes(x = location, y = perc, fill = functionalities_grouped, group = functionalities_grouped), 
    stat = 'identity', position = 'dodge'
  ) +
  facet_wrap(~ location, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = location, y = perc, label = ratio, group = functionalities_grouped),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  labs(x = "Location", y = "Percentage", title = "Distribution of Functionalities per Location (%)", subtitle = "Grouped bars version") +
  theme_bw()
```

```{r Functionalities - Location: Grouped by counted Location, warning=TRUE}
ggplot(data = locacion_functionalities) + 
  geom_bar(
    aes(x = location, y = count, fill = functionalities_grouped, group = functionalities_grouped), 
    stat='identity', position = 'dodge'
  ) +
  facet_wrap(~ location, scales = "free_x", drop = TRUE) +
  geom_text(
    aes(x = location, y = count, label = count, group = functionalities_grouped),
    position = position_dodge(width = 1),
    vjust = 0.1, size = 3
  ) +
  labs(x = "Location", y = "Count", title = "Distribution of Functionalities per Location (Count)", subtitle = "Grouped bars version") +
  theme_bw()
```











### Activities

#### Activities vs Gender

We want to identify if there are a relation between the sport activities tracked by users per gender.

```{r Gender - Activities relation}
gender_activities <- activities_grouped_rows %>% 
  group_by(gender, activities) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(count/sum(count))) %>% 
  drop_na(activities)

# Chart
ggplot(gender_activities, aes(x = factor(gender), y = perc*100, fill = factor(activities))) +
  geom_bar(stat="identity", width = 0.7, position="fill") +
  labs(x = "Gender", y = "Percent", fill = "activities", title = "Percentage distribution of 'Activities' per Gender") +
  theme_minimal(base_size = 14) +
  geom_text(data = gender_activities, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill=guide_legend(title="Activities"))
```
#### Activities vs Age

We want to identify if there are a relation between the sport activities tracked by users per age

```{r Age - Activities relation}
age_activities <- activities_grouped_rows %>% 
  group_by(age, activities) %>% 
  summarise(count = n()) %>% 
  mutate(perc = count/sum(count), ratio = scales::percent(count/sum(count))) %>% 
  drop_na(activities)

# Chart
ggplot(age_activities, aes(x = factor(age), y = perc*100, fill = factor(activities))) +
  geom_bar(stat="identity", width = 0.7, position="fill") +
  labs(x = "Age", y = "Percent", fill = "activities", title = "Percentage distribution of 'Activities' per Age") +
  theme_minimal(base_size = 14) +
  geom_text(data = age_activities, aes(y = count, label = ratio), position = position_fill(vjust = 0.5)) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 5L)) +
  guides(fill=guide_legend(title="Activities")) +
  theme(axis.text.x = element_text(angle = 45))
```